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1 was first motivated to write this book several years ago when I 
proposed to a committee at our university that our sophomore statistics 
course be placed on the list of recommended courses for the general 
studies program. I was dismayed to discover that some committee 
members found it inconceivable that such a course could be necessary in 
this program, which makes up the liberal education part of a bachelor’s 
degree at our university. Upon reflection 1 decided that the committee 
may have been correct; the way in which we were teaching the course 
was not suitable for a liberal education degree. 

My second motivation occurred in the fall of 1974, when I was 
teaching sophomore statistics. We were using a well-known elementary 
statistics book that accomplishes almost all that the author intends; 
however, it is a book in mathematical statistics with almost all of the 
mathematics stripped out of it. As a result, it is difficult for students to 
perceive the value of the study of statistics. 

This book is not subservient to mathematics; although the level of 
mathematics is elementary, the book seriously discusses statistical ideas. 
Students are generally intelligent individuals, even though some of them 
have little mathematical background. This book, I hope, provides an 
intellectual experience at the level of the student. It is suitable for a liberal 
education-for relating statistics to the history of people, to the 
humanities, to science in the broad sense, and to the world of great ideas. 
It is not highly structured in terms of the usual sequence of ideas, from 
probability to estimation and testing. 

It discusses briefly the history of statistics. This includes the descrip¬ 
tion of states given by Aristotle, the Staatenkunde of the German and 
Italian scholars, the political arithmetic of seventeenth-century England, 
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the development of probability as it relates to gambling theory, and the 
beginnings of experimental science as it relates to the statistics of 
experimental design. 

Information about the people involved in the development of statistics 
is also presented so that students will have some knowledge of Galton, 
Pearson, Fisher, and “Student.” 

Important ideas of statistics are introduced such as randomization, 
experimental error, the experimental principles of the Rothamsted 
school, the central limit theorem, the concept of an experimental design, 
the idea of a mathematical model on which to base statistical analyses, 
the idea of a conceptual population of repetitions, and the role of 
statistical methods in scientific research. 

Students should know about the great controversies that have torn 
but, at the same time, enriched and solidified the profession of statistics 
and about the controversies surrounding Bayesian statistics that have 
gone on for several hundred years. A knowledge of the disagreement 
between Fisher and Pearson is helpful in trying to comprehend the 
distinction between Fisherian and Neyman-Pearson statistics. 

Also, students should realize that the subject of statistics is still in the 
making, that the final chapter has not yet been written, that statisticians 
above all others do not fully understand how to collect or organize data. 

The organization of the book may strike some as unorthodox; there 
are thirty-six short lectures with no subtitles. However, this reflects the 
way in which I usually organize a one-semester course: one lecture on 
the normal distribution, one lecture on the Poisson, for instance. 
Instructors using this book can introduce their own subtitles. 

There is adequate material for a more traditional course in statistics. 
Such a course might proceed as follows: descriptive statistics (Lectures 4^ 
to 6), probability (Lectures 7 and 8), probability distributions (Lectures 
10 to 12), sampling distributions (Lectures 13 to 1 5), testing (Lecture 17), 
estimation (Lecture 18), correlation and regression (Lectures 20 and 23), 
t tests (Lecture 29), analysis of variance (Lecture 30), and chi-square 

(Lecture 34). 
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QUIPU. The ancient Inca system of recording vital statistics. Courtesy of the 
American Museum of Natural History. 
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ARITHMETIC 


1.1 And the Lord spake unto Moses in the wilderness of Sinai, in the tabernacle 
of the congregation, on the first day of the second month, in the second year 
after they were come out of the land of Egypt, saying. Take ye the sum of all 
the congregation to the children of Israel, after their families, by the house 
of their fathers, with the number of their names, every male by their polls: 
From twenty years old and upward, all that are able to go forth to war in 
Israel: thou and Aaron shall number them by their armies. 

The fourth book of the Old Testament begins with this instruction to 
Moses to conduct a census of the fighting men of Israel. In the passages 
that follow we are given the results of that early census, conducted about 
1500 B.c. Actually, we know that censuses were carried out in much 
earlier times for purposes of taxation. Censuses in ancient Babylonia, 
China, and Egypt apparently were taken as early as 3000 b.c. 


1.2 One of the most interesting accounts of an early census is given in the 
twenty-fourth chapter of the Book of Second Samuel. “And again the 
anger of the Lord was kindled against Israel and He moved David 
against them to say, Go, number Israel and Judah.” David instructed a 
reluctant Joab to make a census of the people to determine the number of 
fighting men. It is recorded that because David did this, divine wrath was 
visited on Israel and that 70,000 men died of a pestilence. 

The census of King David (c. 1500 b.c.) and the resulting punishment 
seem to have provided a basis for future public resistance of censuses. 
Governor Hunter of New York reported in 1712 [2], “1 have issued out 
orders to the several counties and cities for an account, of the numbers of 
their inhabitants and slaves but have never been able to obtain it 
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compleat, the people being deterr’d by a simple superstition and 
observation, that the sickness follow’d upon the last numbering of the 
people.”* When we consider that the early censuses were precursors of 
military drafts and tax collectors, it is not surprising that they were 
resisted by the populace. 


1.3 The word census itself is derived from the Latin word censere, which 
means to tax. The Roman census was established by the sixth king of 
Rome, Servius Tullius (534-378 B.C.). Under this system Roman officials 
called censors made a register at 5-year intervals of the people and their 
property for taxation purposes and for determining the number of able- 
bodied fighting men [4]. In 5 B.c. Caesar Augustus extended the census 
to include the entire Roman Empire. Thus it is that we have the 
beginning verse in the beautiful and traditional Christmas story: ‘‘And it 
came to pass in those days, that there went out a decree from Caesar 
Augustus, that all the world should be taxed.” To register for such a 
taxation, Joseph and Mary journeyed to Bethlehem, where the infant 
Jesus was born. The last regular Roman census was conducted in 74 a.d. 
With the collapse of the Roman Empire, regular periodic censuses were 
not conducted in the Western world until the seventeenth century. 


1.4 The name statistics can be traced to the Latin words status, meaning 
state, and statista, meaning statesman. Aristotle (384-322 b.c.) [5] was 
born in Macedonia, studied under Plato in Athens, and was a tutor to 
Alexander at the request of King Phillip. He established his own school 
in Athens when Alexander inherited the throne. The Politeiai of Aristotle 
contained a description of 158 states. This initial attempt at the 
comparative description of states was subsequently developed by Italian 
and German authors into a subject called statistics (Staatenkunde in 
German). Westergaard [6] traces the development of the description of 
states. 


1.5 During the Middle Ages the system of feudalism more or less rendered 
national censuses impossible, although there were attempts to revive 
them. One notable example is the breviary of Charlemagne in 808 a.d.. 

At Ch.istmas in 1085, William the Conqueror ordered a statistical 
survey of England. The record of this survey is contained in the 


• From E. B. O’Callaghan, ed. Documents Relative to the Colonial History of the State of 
New York. Vol. V (Albany, N.Y.: Weed, Parsons. & Co., 1855, p. 339). 
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Domesday Book [4]. The survey collected information of land, land- 
owners, land use, tenants and servants, and livestock and served as the 
basis for taxes until 1522 when a new Domesday Book was completed. 

1.6 Early in the sixteenth century Bills of Mortality (published summaries) 
began to appear in London. David [3] traces the beginning of these bills 
to an order from Thomas Cromwell, acting on behalf of Henry VIII. It 
has been speculated that the king desired these summaries because of his 
great fear of the plague. However, Cassedy [2] indicates that it is very 
difficult to determine the beginning of the bills and that the precise date is 
unknown. In the beginning the Bills of Mortality recorded only deaths 
from the plague. Over the years they were expanded to include 
christenings and, around the end of the sixteenth century, data on deaths 
from other diseases. 

1.7 The Spaniards conducted very early censuses in the Americas. A census 
of Peru in 1548 was carried out by the Spanish viceroy, Don Pedro de la 
Fasca. This census is described by Carlos A. Uriarte in the March 1949 
issue of Estadistica. Before the Spaniards came, the Incas had their own 
system of recording statistics. This system used intertwined colored 
strings and knots known as quipus. Cassedy [2, p. 3] quotes historian 
William H. Prescott in his description of the system of quipus as a 
method “which has scarcely a counterpart in the annals of a semicivilized 
people. A register was kept of all the births and deaths throughout the 
country and exact returns of the actual population were made to 
government every year by means of the quipus.”* 

1.8 In seventeenth-century England there was great interest in the so-called 
political arithmetic, which consisted largely of analyses of recorded 
births and deaths. In 1662 John Graunt published his lirst and only 
book, a remarkable manuscript entitled Natural and Political Observa¬ 
tions upon the Bills of Mortality. Despite the unreliable nature of the 
data contained in the Bills of Mortality, Graunt made an exhaustive 
study of the information contained therein and noted many regularities 
and irregularities. For example, he noted that the fraction of male births 
is almost exactly that of female births, the fraction of male births being 
slightly greater. This observation, well known in our day, seems to have 
been new and surprising in 1662. Over a long series of years, he counted 

♦Quoted in Handbook ofyUal Statistics (N.Y. Statistical Onice of the United Nations. 
1955. p. 4). 
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the christenings of 139,782 boys compared with 130,866 girls. Graunt 
made a determined if awkward attempt to develop a mortality table of 
the type used by insurance companies today. 

1.9 The word statistics was coined by the German scholar Gottfried 
Achenwall about the middle of the eighteenth century. Of course, it was 
derived from the word status and the German counterpart of political 
arithmetic. The word was apparently used for the first time in Great 
Britain by Sir John Sinclair who, in a series of volumes published 
between 1791 and 1799, gave a statistical account of Scotland drawn up 
from the communications of the ministers of the different parishes. Yule 
[7] quotes Sinclair as saying, “Many people were at first surprised at my 
using the new words Statistics and Statistical, as it was supposed that 
some term in our own language might have expressed the same 
meaning.”* It is hard to believe today that such a short time ago the 
word statistics was considered new. 

Boorstin [1] points out that statistics appeared in the Encyclopaedia 
Britannica in 1797. He also mentions that for a time the word publicistics 
competed in literary use. It is interesting to speculate about the course of 
events if publicistics had won. Would there have been an American 
Publicistical Association or a Royal Publicistical Society? 

1.10 When the Constitution of the United States was written, the census 
became a regular and vital part of the government. The census was 
provided for in Article I, Section 2, which states, “The actual enumer¬ 
ation shall be made within three years after the first meeting of the 
Congress of the United States, and within every subsequent term of ten 
years, in such manner as they shall by law direct.” The first decennial 
census in the United States was taken in 1790, and other censuses have 
followed every 10 years since. In addition, censuses in certain fields have 
been taken at more frequent intervals. 

1.11 In this lecture we have sketched a few of the origins of census data and 
the analysis of such data. The subject of statistics, which owes its name to 
the description of states, has expanded far beyond these original 
boundaries, but the modern versions of political arithmetic and 
description of states constitute an important part of statistics, 

* Reproduced by permission of the publishers, Charles Griffin & Company, Ltd., of 
London and High Wycombe, from Yule, An Introduction to the Theory of Statistics, 5th 
Ed., 1919 (15th Ed., Yule & Kendall, 1950). 
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SUMMARY. The root word of statistics is slate, and the description of 
states constitutes one of the important roots of modern statistics. Partial 
descriptions of nations and states were provided by the early censuses, 
which were conducted in order to raise armies and establish tax rolls. 
Many examples of censuses can be found in the Old Testament, and the 
New Testament begins with the account of a census of the Roman 
Empire conducted by Caesar Augustus. 

German and Italian scholars developed the description of states into a 
subject that more nearly resembles modern statistics. Under William the 
Conqueror, a detailed statistical survey was made of England and 
recorded in the Domesday Book. In seventeenth-century England 
political arithmetic flourished. Consisting largely of analyses of recorded 
births and deaths, this provided some of the first mortality tables. 

The census was required by the Constitution of the United States, 
although the word statistics itself was not firmly established until 1797, 
when it first appeared in the Encyclopaedia Britannica. 
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EXERCISES 


1. Give a summary of the statistical information available from the 
U.S. Census Bureau. 
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2. Give a summary of statistical information based on sampling 
provided by the U.S. Census Bureau. 

3. What was the population of the United States as determined by the 
1790 census? 

4. Find other censuses referred to in the Bible other than those 
mentioned in the lecture. 

5. Try to determine if your hometown has any local statistical record. 

6. Try to find characteristics of different countries that are not 
described statistically. 

7. The Statistical Abstract of the United States gives detailed inform¬ 
ation on many characteristics of the United States. Using the most 
recent edition, what would you say is the windiest city? The hottest 
city? The sunniest city? What qualifications do you wish to attach to 
your answer? 

8. Using the 1977 County and City Data Book issued by the U.S. 
Bureau of the Census, take a random sample of 10 counties from 
your state. Estimate the percent unemployed for the state using only 
the county figures. How close is your estimate to the figure reported 
for the state? 

9. Using data from the Statistical Abstract, graph crude oil production 
in the United States from 1950 to the present. 

10. To attack the major problems confronting humanity (hunger, war, 
depletion of raw materials, disease, pollution, etc.), can you think of 
any statistical information that should be maintained and is not? 
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2.1 The random event came slowly to Western civilization. Games of chance 
were played thousands of years before Christ and continued to be played 
in times dated a.d., despite vigorous opposition from the Church. Yet 
explicit formulation of the concept of a random event and the 
subsequent theory of probability came late; some historians date the 
beginnings of probability theory as recently as 1500 a.d. 


2.2 The astragalus is a bone that lies above the talus, or heelbone. In ancient 
and medieval literature, the words astragalus, talus, knucklebone, and 
hucklebone were used indiscriminately. Interesting and scholarly ac¬ 
counts of the use of the astragalus in early games of chance are given by 
David [2, 3]. We know from archaeological findings that large numbers 
of astragali, particularly from hooved animals, were collected in ancient 
times. We also know that these bones were used in games of various 
kinds. As pointed out by David, the astragalus was certainly used in 
various board games in Egypt (c. 3500 b.c.) ; a game called “hounds and 
jackals” by the excavators of Egyptian tombs apparently used astragali 
in the same manner as dice. The hounds and jackals were moved 
according to the results of throwing the astragali. David notes that 
Homer (c. 900 b.c.) tells us that Patroclus became angry with his 
opponent while playing a game of knucklebones and nearly killed him. 

By the time of the Romans, gaming was very popular and laws were 
passed forbidding and regulating it. Stern opposition by the Church 
followed and continues today but, despite all such opposition, gaming 
and gambling have flourished among all classes of people. In the early 
days most games involved the astragalus; dice and cards were used later. 
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According to David, the earliest dice found so far date from the 
beginning of the third millenium. A die found in northern Iraq and made 
of well-fired buff pottery has the opposite points in consecutive order: 2 
opposite 3, 4 opposite 5, and 6 opposite 1. David suggests that the die 
with opposite faces totaling 7 must have evolved about 1400 b.c. 

In addition to the use of games, chance mechanisms have been used 
throughout the ages to divine the will of the gods. David records many 
such practices and gives an interesting account of such a practice in fairly 
recent times. As recently as 1737, John Wesley sought guidance by 
resorting to the practice of drawing lots to decide whether or not to marry. 


2.3 Although references to games of chance are numerous and span 
thousands of years, elementary concepts of probability appear in the 
literature only in the last few hundred years. Why was the theory of 
probability so slow in being developed? In an article on the history of 
probability, Kendall [4] raises this question and offers an opinion. He 
suggests that it was because of moral and religious barriers. It certainly 
seems plausible that the opposition of the Church may have been based 
on theological implications and not on social consequences of gambling. 
However, the explanation offered by Kendall remains speculative. 

H. Walker [8] refers to a reference given by Smith [6] that suggests 
some idea of probability in Chinese literature as early as 220 b.c. 
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Kendall draws our attention to an early example of counting the 
possibilities in dice games. Bishop Wibold, about 960 a.d., enumerated 
56 virtues, letting each virtue correspond to the outcome of the throw of 
three dice. When three dice arc thrown, there are 56 possible outcomes 
when we ignore the order (c.g., 2,2,1 is the same as 1,2,2). The dice were 
then thrown, and the thrower concentrated on the corresponding virtue 
(honesty, loyalty, etc.) for some time. There are many references to 
counting problems associated with games of chance prior to the 
development of ideas of probability. David references the Chinese writer 
Chu Shih-chieh, who published the arithmetical triangle of binomial 
coefficients in 1303 while describing it as an ancient method. He gives a 
diagram similar to that in Figure 2.1. This triangle array is known today 
as Pascal’s triangle, although it was published by Pascal 351 years after 
Shih-chieh. We defer a fuller explanation until later, when we will have 
several occasions to make use of this triangle. 

2.4 One of the earliest examples of a gambling problem in a mathematical 
work is given in the work of Fra Luca Pacciolo called Simtnui de 
arithmetica, geometrica, proportioni e proporiionalita, which was printed 
in Venice in 1494. He gives the problem of how to divide equitably the 
stakes between two players when a game is interrupted before its 
conclusion. This is one of the earliest versions of the problem of points. 
Walker mentions that this problem was repeated in almost all sub¬ 
sequent works on probability. 

Although not published until 1663, the first work containing extensive 
material on probability is The Book on Gomes of Chonce {Liber de lido 
aloe) by Cardano [1]. Cardano wrote this book about 1520 while he was 
at the University of Padua. In addition to being a handbook on 
gambling, it represents one of the first examples of putting the odds for 
chance events into mathematical terms. Although he made numerous 

errors, Cardano solved many problems, formulating them in terms very 
similar to those of today. 

2.5 It IS of interest that Kepler made some remarks on chance in a work 
published in 1606 (see Todhunter [7, p. 4]). Kepler was speculating 
about the cause of the appearance of a new star that shone brightly in 
1604. Todhunter also mentions that Galileo made some contributions to 
probability in a work published in 1642. Galileo solved a problem 
involving the throw of three dice. He showed that of the 216 possible- 
cases. there are 27 that give the number lOand 25 that give the number 9. 
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2.6 The next major recorded work in probability is found in correspondence 
between Blaise Pascal and Pierre de Fermat. Pascal and Fermat were 
two of the most distinguished mathematicians of their time. Pascal 
(1623-1662) established lasting fame as a physicist and mathematician 
by the age of 25, when he retired to a life of religious contemplation. 
Fermat (1608-1665) stated a number of remarkable propositions in the 
theory of numbers. Todhunter notes, “Neglecting the trifling hints which 
may be found in preceding writers we may say that the Theory of 
Probability really commenced with Pascal and Fermat; and it would be 
difficult to find two names which could confer higher honour on the 
subject.”* 

The Chevalier de Mere is famous for having posed questions that 
started the Pascal-Fermat correspondence, although little is known 
about de Mere. David says that “It has never been stated how Pascal and 
de Mere began to discuss such problems but it may be noted that this 
contact was made during what might be called Pascal’s dissolute 
period. Since de Mere was not a member of the continuance of what has 
been the Mersenne Academy, they may have met in less academic 
surroundings.”! The speculation is frequently made that the Cheval¬ 
ier’s questions were motivated by his occasional visits at the casino. 
Apparently de Mere posed the problem of points to Pascal. Suppose a 
gambler is undertaking to throw a six with a die in eight throws. If he 
does not succeed in the first three throws, what proportion of the stake 
should he be entitled to if he gives up his fourth throw? Pascal and 
Fermat carried on an extensive correspondence about this and other 
problems of points. 

Among the various dice games of that age was the following. The 
house offers to bet even money that in four throws of a single die at least 
one six will occur. It is claimed that de Mere was interested in a slightly 
more complicated game. Suppose that two dice were thrown. The 
thinking of the day argued that since two dice can result in six times as 
many outcomes as one die, 24 (6 x 4) throws of two dice should be 
favorable for the house that at least one pair of double sixes will occur. It 
is believed by some that de Mere had obtained empirical evidence (in the 
casino, no doubt) that in 24 throws the odds were against the house. 
Pascal showed that the probability of at least one double six was, in fact, 
equal to .491. 

* Reprinted with permission from 1. Todhunter, y4 History of the Mathematical Theory of 
Probability iChchea Publishing Company, 1949). 

t Reproduced by permission of the publishers, Charles Griffin & Company Ltd, of 
London and High Wycombe, from David, Games, Gods, and Gambling, 1962. 



15 THE CASINO 


Another item in the Pascal-Fermat correspondence is the arithmetical 
triangle referred to earlier. Although Pascal was one of the last of a long 
line ofdiscoverers of this device, it is interesting that it is generally known 
as Pascal’s triangle. 

2.7 From the seventeenth century to the present, the theory of probability 
has developed far beyond its simple origins of dealing with games of 
chance. The story of these years is unbelievably rich, and we will 
conclude this lecture with only a few fragments. The story includes the 
work of Leibnitz, published in 1666, in which a table similar to the 
arithmetical triangle is given. Leibnitz gives a curious notation. When a 
set of things is taken two at a time, he uses com2natio; when three at a 
time com3natio; when four at a time com4natio; etc. Leibnitz used +, 
—, = in their present sense. He denotes multiplication and division by 
P) and y, which are used in modern set theory versions of multiplication 
and addition. Curiously, he used the word productum to denote a sum, 
as in 8 is the productum of 5 -f 3. 

Another version of the arithmetical triangle is given in the /Ir.s' 
Conjectandi, 1713, published posthumously after the death of its author, 
James Bernoulli (1654-1705). In many ways this work is extremely 
important. Bernoulli maintained emphatically that there was appli¬ 
cation for probability in civil, moral, and economic affairs. However, it 
remained for those who came later to find applications. 

In a paper published on November 12, 1733 and distributed only to 
friends, De Moivre gave the famous normal curve. Walker points out that 
this origin of the normal curve was discovered by Pearson [8]. 

In the late 1700s and early 1800s, when the United States was in its 
infancy, the normal curve was being firmly established by the great 
mathematicians Laplace and Gauss. Gauss made use of the normal 
curve in the analysis of astronomical data. It is curious that the normal 
curve is widely known as the Gaussian curve or distribution. 

Bernoulli s belief that probability would come to be applied in many 
fields of human endeavor has been justified. Much remains to be done, 

far removed from the casino, but the gambling origins remain a vital and 
interesting part of the subject. 


SUMMARY. Games of chance have been played from thousands of 
years before Christ to the present. All sorts of chance mechanisms have 
been used, but one of the earliest involved the throwing of polished bones 
called astragali. Dice and cards came much later. 
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Although games of chance and chance mechanisms were well known, 
no adequate theory of probability existed. No real development of the 
subject even began until the middle of the seventeenth century. Certainly 
some traces of the subject can be found much earlier, but the 
development of probability as a serious subject began in the Western 
world with the work of Pascal and Fermat. Their work, directed 
specifically toward problems in gambling, started probability on its 
path. 

With the work of men such as Bernoulli and De Moivre in the 
eighteenth century and Gauss and Laplace in the nineteenth century, the 
theory of probability was ready for the torrent of statistical development 
that began in the late nineteenth century. 
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EXERCISES 

1. Prepare a short report on the gambling origins of probability. Most 
encyclopedias contain material that will aid you. 

2. Try to discover the rule for generating successive rows of Pascal’s 
triangle. 



17 THE CASINO 


3. Throw a pair of dice 24 times and record the number of occurrences of 
a double six. Repeat this exercise 20 times and record the number of 
times that at least one six occurs. Is this result in reasonable 
agreement with Pascal’s result? 

4. A carnival game offers a prize of $5 if you get at least one six in three 
tosses of a die. What would you be willing to pay for a chance to play ? 
What do you feel are the odds in favor of your winning? After 
throwing the die once and not getting a six? After throwing twice 
without producing a six? 

5. Two women, meeting again at a class reunion, agree to play the 
following game. The person with the least money in her wallet will 
lose all of her money to the other. Each is thinking, “If I lose, I will lose 
less than I will win, if I win. So the game is in my favor.” Where is the 
fallacy? 

6. A magazine promotion campaign sponsored a $100,000 sweepstakes. 
Each person receiving an entry blank was given the chance of winning 
the grand prize of $100,000. If the chance of winning is 1 out of 
100,000, is it worth the postage stamp to enter? 

7. A professional gambler wishes to purchase a pair of loaded dice. He 
hires an artisan to build a pair of dice for which the probability of 
rolling a pair of ones is exactly 1 in 64 instead of 1 in 36. He pays for 
the dice and is told that every possible calculation has been made and 
that the desired probability is undoubtedly 1/64. However, the dice 
have never been thrown. Would you believe the artisan’s claim? 




MENDEL'S GARDEN. 


The small monastery garden where the pea 


experiments were conducted. Courtesy of Professor Dr. Ing. Jaroslav 


Krinzenecky. 









THE 

SCIENTISTS 


3.1 In the spring of 1969, a series of three lectures at the Johns Hopkins 
University [2] dealt with an ancient theme—the relationship between 
observation and theory. The lecture by Ernest Nagel is entitled “Theory 
and Observation” and begins by discussing Einstein’s views on the 
antithesis between the deductive and empirical components of science. 
Nagel quotes Einstein as saying, “... purely logical thinking cannot yield 
us any knowledge of the empirical world; all knowledge of reality starts 
from experience and ends with it.”* 

The logical status of scientific theories in their relation to observation 
has concerned scientists and philosophers for centuries. In my view, 
modern statistics helps to relate observation to theory and is therefore 
an integral and vital part of scientific thinking. However, statistical 
methods based on statistical theory have existed for only about 100 
years. In this lecture we wish to mention a few of the people and ideas 
related to the development of statistics in science. 


3.2 In 1620 Francis Bacon published the Novum Organum. In this re¬ 
markable document. Bacon makes certain pronouncements on the 
scientific method. Without direct experience in the laboratory. Bacon 
concludes that we can learn only through observation. “Man, as the 
minister and interpreter of nature, does and understands as much as his 
observations on the order of nature, either with regard to things or the 
mind permit him, and neither knows nor is capable of more’t [4, p. 37]. 


♦From Nagel, Bromberger, and Grunbaum. Observation and Tlieorv in Science 
copyright -0 1971 by the Johns Hopkins Press. 

Oniyms o/Science, ed. by Georuc Schwarts and 
I hihp W. Bishop, copyright 0 1958 by Basic Books. Inc., Publishers, New York. 
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He proposed investigating all facets of the universe by collecting all 
possible data and performing every conceivable experiment. Then the 
data were to be tabulated and analyzed. Although Bacon’s proposal of 
such an exhaustive task seems naive, it represented an important 
endorsement of the empirical method. 

From the very beginning of science, of course, data were analyzed and 
interpreted. However, this did not resemble what we call statistics today 
until the ideas of probability and political arithmetic began to be merged 
in the nineteenth century. 


3.3 As early as 1783, Laplace proposed the normal curve equation as being 
appropriate for the probability distribution of errors. This curve is given 
in Figure 3.1. 


/(.O 



Figure 3.1 Normal Probability Curve. 

As noted in Lecture 2, this curve was derived previously by De Moivre in 
1733 in an obscure paper that was uncovered in 1924 by Karl Pearson. In 
1809 Gauss published his great work on the theory of motions of 
heavenly bodies [1]. In Section III of Book II of this work he derives the 
normal curve as appropriate for a law of errors, at the same time 
acknowledging Laplace’s earlier derivation. The basic idea is that if 
errors of observation are made up from the addition of many small, 
independent errors, positive errors will be equally as likely as negative 
errors, with the concentration close to zero, as in Figure 3.1. 

With the use of the normal probability curve by Gauss and Laplace, 
the theory of probability was firmly launched on its way to applications 
in almost all fields of scientific endeavor. Bernoulli’s belief that 
probability could have applications was beginning to be justified. 
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The great Belgian mathematician and astronomer Adolphe Quetelet 
(1796-1874) made a giant step in the development of statistics in the 
social sciences when he extended the use of the normal curve from errors 
of observation to data of various kinds. He observed that if the heights of 
a large number of men were shown on a bar graph, the picture resembled 
the normal curve. Thus he began the subject of social statistics. Although 
Quetelet is best known for his moral statistics, the effort to quantify 
psychological and social facts and customs, he recognized the univer¬ 
sality of the emerging subject. Walker [6, p. 41] notes that Quetelet 
presented to the British Statistical Association in 1841 a list of more than 
40 topics that should be investigated by statistical methods. 

One of the intriguing theories of Quetelet was the theory of I’homme 
moyen, or the average man. Apparently he felt that the average man, if 
such existed, represented all that was good and beautiful. 


3.4 In the latter half of the nineteenth century, statistics played a major role 
in the study of the laws of heredity. Darwin returned from the voyage of 
the Beagle to publish The Origin of Species in 1859. This work had a 
profound effect on many of the great minds. 

In Austrian Silesia, now in Czechoslovakia, Gregor Mendel had 
returned to Brno as a supply teacher in July 1853. In 1856 he embarked 
on his experiments with the edible pea, Pisum sativum. In this work, 
which continued until 1863, Mendel discovered that the laws of heredity 
are statistical. He selected seven of the characteristics on which peas 
differ: seed shape, seed color, pod shape, color of seed coat, color of 
unripe pods, position of flowers, and length of stem. To explain the idea, 
consider color. Mendel crossed green peas with yellow peas; the result 
was that all the next generation were yellow. With each of the other six 
characteristics, he obtained similar results. The new generation of peas 
always resembled one of the parents. 

When a second generation of peas was bred from the yellow peas of 
mixed parentage, about three-quarters of the peas were yellow, the 
others green. Mendel’s data are given by Sootin [5]. 




Ratio of 

Yellow 

Green 

Yellow to 

Seeds 

Seeds 

Green Seeds 

6022 

2001 

3.01 to 1 



22 IDEAS OF STATISTICS 


The agreement with a 3:1 ratio is remarkable, and a similarly strong 
agreement exists in Mendel’s data on the other six characteristics. 

Mendel presented his work at two meetings of Naturforschenden 
Vereins in Brno in 1865 and published the results in a paper in 1866. 
Curiously, his work was either unnoticed or ignored until about 1900, 
when it was rediscovered. The 3:1 ratio results from the following 
simple explanation. The pure yellow and pure green peas have two 
characteristics. 


Pure Yellow. YY 
Pure Green, gg 

When these peas are crossed, there is a random selection of one of the 
characteristics from each, resulting in Yg in the first (F,) generation. 
When hybrid peas from the F, generation are crossed again, there is a 
random selection of one of the characteristics from each of the parents, 
resulting in the following in the second (F 2 ) generation: 

YY Yg gY gg 

Thus Mendel was led to the 1:2:1 ratio. In the case where one is 
dominant over the other, as with yellow over green, the observed ratio is 
3:1. 

Mendel carried his research to the third generation, but we will not 
discuss that. From today’s viewpoint it is hard to appreciate the 
magnitude of his ideas. But Mendel was not operating from such a 
viewpoint. His idea, which seems so simple now, must rank among the 
truly great ideas in history. Robert Olby, in the Author’s Note to The 
OrUjins of Mendelism [3], quotes from a B.B.C. talk: “Often as Newton 
said, the successor can see further than his precursors because, standing 
on their shoulders, he can see a little further than they. There was nothing 
of this said about Gregor Mendel. He had no precursors to stand on at 
all.”* 

In England the work of Darwin had a profound effect on his cousin, 
Sir Francis Galton. Better equipped mathematically than Darwin to 
deal with a theory of evolution, Galton published many articles and 
books on heredity. Hereditary Genius was published in 1869 and Natural 
Inheritance in 1889. Galton was the first to use correlation and 
regression, concepts of tremendous importance that we will discuss in 
later lectures. 


* Reprinted by permission of Schocken Books Inc. from Origins of Mendelism by Robert 
C. Olby. Copyright ® 1966 by Robert C. Olby. 
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It is certain that Galton knew of the 1 ;2:1 Mendelian ratio and that 
he transmitted this ratio to Darwin in a letter written in 1875 [3. p. 72]. 
Galton mentions the Pascal triangle as one way of arriving at the ratio 
(the third line of the triangle is 1:2:1). Neither Darwin nor Galton seems 
to have appreciated the importance of this ratio. 


3.5 At University College, London, in the 1890s Pearson, excited by 
Gabon’s Natural Inheritance, began to apply mathematics and the 
theory of probability to Darwinian evolution, and the era of modern 
statistics began. In 1901 the first issue of Biometrika, a journal devoted to 
the application of mathematics to biology, appeared; Pearson was one 
of the editors. 

In the twentieth century, statistics has developed far beyond its 1901 
confines and has proven to be useful in all fields of scientific endeavor. 


SUMMARY. The origins of experimental science lie shrouded in the 
past. However, a few traces can be found prior to the seventeenth 
century. In 1620 Francis Bacon argued that man can learn only by 
observation. With the rapid expansion of experimental science in the 
eighteenth and nineteenth centuries, scientists began to make use of the 
existent probability theory. 

Gauss and Laplace made extensive use of the normal curve in the 
nineteenth century, as did the great Belgian astronomer Quetelet. 

In a monumental piece of experimental research between 1856 and 
1863, Mendel laid bare the statistical laws of genetics and established the 
foundations of Mendelian genetics. Without any precedent, Mendel 
perceived that the genetic mechanism operated like a random device. 

In England the work of Darwin influenced Galton, who made import¬ 
ant contributions to the foundations of biometry. In turn, Pearson, 
heavily influenced by the work of Galton, launched a career that was to 
earn him the title of father of statistics. 
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EXERCISES 

1. Look at the textbooks and periodicals in your major field to find 
some use of the normal curve. 

2. Find a description of the scientific method. At what stages of the 
scientific method do you feel statistics can enter? 

3. Look at Darwin’s Origin of Species. Did Darwin make use of any 
statistical arguments? 

4. To what extent are theories of evolution based on statistical evidence? 

5. In the science courses you have taken (high school and college) were 
there any scientific theories based strictly on theoretical reasoning 
without observations? Are you sure? 

6. Simulate the pea experiment of Mendel’s by tossing a pair of coins (a 
quarter and a nickel). Let two heads correspond to YY. Let heads on 
the quarter and tails on the nickel correspond to Yg; let tails on the 
quarter and heads on the nickel correspond to gY, and let two tails 
correspond to gg. Toss the pair of coins 100 times and record the 
number of occurrences of YY, Yg, gY, and gg. Are the results in 
reasonable agreement with the 1:2:1 ratio? 

7. Carry on the theory of the pea breeding experiment to a third 
generation, crossing a yellow pea from F 2 with a green pea from F 2 . 
What should be the ratio of yellow to green peas? 
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R. A. FISHER. 1890-1962. Reproduced by permission from the 
Department of Statistics, Iowa State University. 






POPULATION 
AND SAMPLE 


4.1 In 1693 Edmund Halley gave tables for “the mortality of mankind" that 
were obtained from mortality data for the city of Breslau. In the 
nineteenth century John Bennet Lawes used annual records of wheat 
yields on five plots at Rothamsted to estimate the change in yield per acre 
from 1852 to 1879 for all of England and Wales. These and other early 
examples of the use of sampling are given by Stephan [7]. They illustrate 
the idea of making inferences about a set of individuals, the population, 
having observed only a part of the population, the sample. We may have 
populations of plants, insects, rocks, words, books, laboratory measure¬ 
ments, or even populations of populations. In most cases numbers are 
associated w-ith each member of the population; the statistician is 
usually interested in the associated population of numbers. 

The distinction between population and sample represents one of the 
most important concepts in statistics. Making this distinction has been 
at the foundation of most of statistical theory and methodology since the 
late nineteenth century. This simple distinction was difficult to make in 
the early, formative years of statistics and provides no end of difficulty to 
todays unwary students. Much of statistics is concerned with making 
inferences about a population on the basis of a sample from that 
population. It is crucial that the distinction be made. 


4.2 As the nineteenth century ended, Pearson (1857-1936) and his associates 

were applying statistical methods to a wide variety of problems in 
evo ution, heredity, and related fields. Through their work, mathema- 
uca probability came to be intimately associated with statistics A great 
deal of their work was involved with the mathematical description of 
populations called universes or aggregates (or kollectivs by the German 
scholars). Although some of Pearson’s work had to do with samples, the 
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distinction between samples and populations was not always main¬ 
tained. In the early 1900s the distinction between population and sample 
became better established, and Yule [9] made the distinction in his 
textbook in 1910. 


4.3 In a very important paper that appeared in 1922 entitled “On the 
Mathematical Foundations of Theoretical Statistics,” Fisher [2] did a 
great deal to solidify the concepts of population, sample, and the 
distinction between the two. He stated that the problems of statistics can 
be divided into three types. ^ 

1. Problems of Specification. These have to do with the choice of 
the probability distribution to describe the population. 

2. Problems of Estimation. These involve the calculation of 
quantities from the sample called statistics to represent quantities 
from the population called parameters. 

3. Problems of Distribution. These problems involve the distri¬ 
bution of statistics and will be discussed at length from Lecture 13 
through the rest of the book. 

Of course, there are other problems of statistics. For example, one of the 
major problems not covered by this classification is the design of 
experiments. This is discussed at some length in Lectures 28 to 30. 


4.4 In subsequent lectures a certain amount of notation will be introduced. 
The distinction between quantities in the sample and corresponding 
quantities in the population will be maintained by adhering to the 
following widespread, well-established convention. Where possible, 
population quantities will be denoted by Greek letters and sample 
quantities by Roman letters. This will help students to avoid confusion. 


4.5 In many subjects there is a canonical problem or a standard form that, 
although simplified, provides a basis for thinking and a base on which to 
develop more complicated ways of thinking. The canonical situation we 
wish to consider occurs when we have a random sample from a specified, 
existent population and we wish to determine from the sample the 
properties of the population. There are three points that we wish to 

make. 
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1. The population is an existent population (e.g., the population of 
sentences in this book). 

2. The sample is a random sample (to be defined presently), not a 
representative sample (e.g., 10 sentences selected at random from the 

population of all sentences). 

3. We may be interested in properties of the population such as the 
average number of words per sentence, but we have only a sample. 
From the sample we must make our conclusions. 

We will now comment on points 2 and 3 in greater detail. There is a 
deeply ingrained feeling on the part of many people that the sample 
should be representative of the population—that it should be the 
population in miniature. A little thought will show that this is 
impossible. A sample of 8 people cannot possibly be “like a population 
of 2000 people. We would need the same proportional representation in 
the sample as in the population for age, sex, race, education, income, etc. 
We might decide to choose 4 people under 30 and 4 people over 30. For 
each of the sets of 4 people we might then choose two males and two 
females. For each of the four sets of two we might choose one black and 
one white. To consider any of the other characteristics of the population 
would literally require splitting people into pieces. As appealing as the 
idea of a representative sample is, it has not proven to be a viable 
concept. There seems to be no way to give an objective definition of 
representative. 

The idea that is now widely accepted is that of choosing the sample at 
random. This does not mean haphazardly, subject to the whims and 
prejudices of the investigator. A random sample is a sample chosen so 
that every possible sample with the same number of observations has an 
equal chance of being chosen. 

The formulation of a definition of random sample has sometimes been 
credited to John Venn and C. S. Peirce (pronounced “Pers") (Carnap, [ 1, 
pp. 493^94]). Although Venn discusses randomness at length, he does 
not give an explicit formulation of a random sample. On the other hand, 
the writings of Peirce are replete with definitions of a random sample. 
For example, Peirce [5] says, “A sample is a random one, provided it is 
drawn by such machinery, artificial or physiological, that in the long run 

any one individual of the whole lot would get taken as often as any 
other.” 

The process of making conclusions from the sample about the 
population is called statistical inference. In subsequent lectures we will 
encounter three main areas of inference. 
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1. Calculating numbers called estimates from the sample to use as 
values for population constants called parameters. 

2. Calculating intervals of certainty (or uncertainty) about our 
estimates. 

3. Using statistical tests to judge the extent to which the sample 
information supports hypothesized values for the population 
parameters. 

Peirce clearly understood the role of samples in making inferences about 
populations. The definition of random sample is taken from the section 
titled “Reasoning from Samples,” and Peirce says, “The truth is that 
induction is reasoning from a sample taken at random to the whole lot 
sampled.” 


4.6 Once the decision has been made to choose a sample at random, there is 
the nontrivial problem of how to accomplish this. Conceptually, one of 
the simplest ways to go about it would be to number the individuals, 
write their numbers on tickets, and draw tickets from a hopper after 
stirring the tickets. Various other devices would be equivalent. However, 
a technique that has been in use for about 50 years is the use of random 
number tables. Instead of using a physical contraption such as a hopper 
with tickets or an urn with numbered balls, one simply turns to a table of 
random digits, reads the desired number of digits, and chooses the 
corresponding individuals for his sample. 

Apparently the first table of random digits was published by Tippett in 
1927 [8]. It was obtained by recording some of the digits taken from 
English tax records. Hald [4] presented 15,000 digits that were compiled 
from drawings in the Danish State Lottery of 1948. Fisher and Yates [3], 
abstracted from Logarithmetica Britannica the fifteenth to nineteenth 
digits and presented them as a table of random numbers. Certainly the 
largest table of random numbers is the Rand Corporation publication of 
1 million random digits [6]. 

Table A.l is abstracted from Hald’s tables. You will note that about 
one-tenth of the digits are zeroes, one-tenth are ones, one-tenth are twos, 
etc. You will also note that the digits are reasonably well mixed. 

Quite often the objection is voiced that “numbers recorded in a table 
cannot possibly be random,” or “the numbers cannot be random after 
they are written down.” Two points should be emphasized. 

1. A random number table is not random but is used to simulate or 
approximate the use of a random device. 
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2. We would not necessarily wish to record the results from a random 
device as random numbers. For example, a run of 1000 zeroes would 
be possible from the spinning of a wheel, but we would not w ant to 
record a run of 1000 zeroes in our table. 

These ideas may require some thought in order to be accepted. How 
could digits recorded from a table of logarithms be more acceptable as a 
random number table than results obtained from a random device? The 
point is that in using a random number table we are more concerned 
with the output from the table than we are with the source of the table. 
We want the table to simulate the use of a random device, not record 
results from a random device. 


4.7 We have discussed the canonical situation of drawing a random sample 
from an existent population. This provides a very useful way of thinking 
about data and, therefore, of making progress in science. Frequently, 
how'ever, the sample is real but the population is only conceptual. We 
may have tossed a coin 50 times and obtained 30 heads. These 50 tosses 
represent a random sample from the conceptual population of all 
possible tosses. A similar situation also occurs frequently in the 
experimental sciences. Suppose that we wish to compare the pain- 
relieving effect of two new drugs, A and B. We select 50 subjects and 
assign A to 25 and B to the other 25. Our sample consists of these 50 
subjects, but what is the population? The population might be a much 
larger group of people, one-half of whom are using A and one-half of 
whom are using B. But, in fact, no such population exists and is 
conceived of by the experimenter as a population that could exist for 
which our sample of 50 subjects could be a random sample. 


4.8 One other departure from the canonical situation arises when the 
population may be real or conceptual but is not well specified. For 
example, suppose that an economist looks at price data for the last 70 
years. She may choose to regard those 70 years’ price data as a random 
sample from some population, neces.sarily vaguely specified (actually, 
such data are referred to as time series). 

As you continue your study of statistics, difficulties will arise. Some of 
them can be surmounted if you insist on making the distinction between 
population and sample. 
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SUMMARY. Explicit examples of sampling from some population can 
be found in the early literature. However, recognition that only samples 
are being obtained from populations is often lacking. Although the 
distinction between sample and population was generally known by the 
end of the nineteenth century, the distinction was not always maintained. 

In the early 1900s the distinction became clearer and was emphasized 
strongly by Fisher in 1922. Following Fisher, statistical theory and 
methods were developed to deal with the problem of making inferences 
about a population using a random sample from the population. 

Implementation of a theory of random sampling requires a random 
device or a simulator of a random device. A very convenient simulator is 
provided by a random number table. 

Given a random sample from a population, statistical theory is 
concerned with making estimates of parameters, calculating intervals of 
uncertainty, and performing statistical tests. 

Finally, statistical theory deals with situations where the population is 
poorly defined or even conceptual. 
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EXERCISES 

1. Consider the population of sentences in Lecture 4 and suppose that 
you are concerned with the sentence length. How would you choose a 
random sample of 10 sentences? 

2. Would you consider the students in this course to be a random sample 
of the students in the university? Why or why not? 

3. Use Table A. 1 to draw a random sample of two-digit numbers from a 
population with equal proportions of the numbers 00, 01, 02,..., 99. 
Choose your sample in the following way. 

a. Pick the starting point by placing a pencil "at random” on the 
page. 

b. Decide in advance whether to read horizontally, diagonally, 
vertically, etc. 

c. Read two-digit numbers (ignoring spacing in the table) starting at 
the starting point and using the rule decided on in part b. 

Does your sample seem reasonable? 

4. Survey the statistics books in your library to find random number 
tables other than the ones mentioned in this section. 

5. Not all who are eligible to do so vote in elections; those who vote 
constitute a sample of the population. 

a. Is it reasonable to consider the people who vote as a random 
sample of the population? 

b. Why are pre-election opinion polls highly suspect when there is 
low voter turnout? 

6. Use Table A.l to draw a random sample of 10 distinct three-digit 
numbers. Use these 10 numbers as page numbers of this book. 

a. In what ways do the sample pages resemble the book as a whole? 

b. In what ways are they markedly different? 

c. Use the sample pages to estimate the fraction of the book devoted 
to exercises. 
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7. The local weather forecast often gives the probability of rain, 
implying randomness of some sort. The local daily rainfalls for the 
past year are a sample of some population. What is the population? 

8. Small opinion surveys are often reported on television with the 
statement that the results are not intended to be scientific—they are 
merely a random sampling of opinion. The first 10 people leaving an 
exclusive men’s store are asked for their opinion. 

a. What is the population being sampled? 

b. Is the sample random? 

c. For what population could the sample be considered a random 
sample? 




KARL PLARSON. 1857-1936. Reproduced from Karl t'earson 
1857-1957, © 1958, Cambridge University Press, by permission from the 


Biometrika trustees. 
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FREQUENCY 
DISTRIBUTIONS AND 
THEORETICAL 

CURVES 


5.1 The first volume of Biometrika [9] begins with an editorial containing 
the following statement. 

The first condition necessary, in order that any process of Natural 
Selection may begin among a race, or species, is the existence of differences 
among its members, and the first step in an inquiry into the possible effect of 
a selective process upon any character of a race must he an estimate of the 
frequency with which individuals, exhibiting any given degree of abnor¬ 
mality with respect to that character, occur.* 

This emphasis on frequency distributions typifies the scientific litera¬ 
ture of the period, which abounded with frequency distributions of all 
sorts of data. 


5.2 In 1898 Bortkiewicz[l] gave the frequency distribution ofthe number of 
men per cavalry corps killed in 1 year by a kick from a horse (see Table 
5.1). The early volumes of Biometrika contain many similar tabulations 
on things such as age of criminals, brain size, number of sepals on 
flowers, etc. Latter [4] gave the frequency distribution on the size of 
cuckoo eggs (Table 5.2). 

Inspection of modern research journals will also reveal frequency 
distributions on data from widely varying sources. Maddi [5] gave the 
data in Table 5.3 on rating of attorneys by presiding judges. 


• Reproduced from Biometrika, /, Editorial, Copyright * 1901 by the Biometrika 
Trustees. Reprinted by permission of the Biometrika Trustees. 
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Table 5.1 Number of Men 
Killed by Horsekicks 
(per Corps per Year) 


Number Killed 

Frequency 

0 

109 

1 

65 

2 

22 

3 

3 

4 

1 


The data in Table 5.4 were given by Kjetsaa [3] in a study of the 
disputed authorship of The Quiet Don. The data give the number of 
different words used per 1000 words for three works—two by known 
authors and one of disputed authorship. They show that The Quiet Don 
resembles the work by Sholokhov more than the work by Kryukov. This 
is particularly interesting in view of the fact that Sholokhov was awarded 
the Nobel Prize for Literature in 1965 for The Quiet Don. Refer to the 
article by Kjetsaa for full details. 


Table 5.2 Frequency Distribution 
of Cuckoo's Eggs 


Breadth 

Frequency (number) 

13.75-14.25 

1 

14.25-14.75 

1 

14.75-15.25 

5 

15.25-15.75 

9 

15.75-16.25 

73 

16.25-16.75 

51 

16.75-17.25 

80 

17.25-17.75 

15 

17.75-18.25 

7 

18.25-18.75 

0 

18.75-19.25 

1 


Source. Reproduced frpm Biometrika. 
1, Latter, The egg of cuculus canorus. 
Copyright © 1902 by the Biometrika 
Trustees. Reprinted by permission of 
the Biometrika Trustees. 
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Table 5.3 Judges' Rating of Attorney Competence 


Percent Attorneys 

Rated Competent 

Judges Reporting 
Frequency 

>99 

4.0 

90-99 

13.3 

80-89 

24.4 

70-79 

31.7 

60-69 

5.8 

50-59 

7.5 

40-49 

6.5 

30-39 

3.5 

20-29 

1.9 

10-19 

1.3 

1-9 

0.1 

<1 

0.0 


Source. Reproduced from "Trial Advocacy Competence: The 
Judicial Perspective," Research Journal, 1978 (1), 46 pp. 


The English clay tobacco pipe often provides valuable clues to 
students of historical sites. In 1954 Harrington published a chart relating 
the diameter'of the hole in the stem to the date of the pipe. These data 
were given by Hume [2] and are shown in Table 5.5. From this table it 
can be seen that measurement of the hole diameter would be of value in 
dating a pipe fragment. 


Table 5.4 Disputed Authorship— The Quiet Don 



Words 

Distinct 

Work 

Sampled 

Words 

Marking Time (Kryukov) 

1000 

589 

The Way and the Road (Sholokhov) 

1000 

656 

The Quiet Don 

1000 

646 


Source. Reproduced from Computers and the Humanities. Vol. 11. Kjetsaa, 
The battle of the quiet don. Table I, p. 342. Copyright © 1977 by Pergamon 
Press, Inc. Reprinted by permission of the author and publishers. 
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Table 5.5 Stem Hole Diameter versus Date (Percentage) 





Date 



Diameter 

1620- 

1650- 

1680- 

1710- 

1750- 

(inches) 

1650 

1680 

1710 

1750 

1790 

4 

64 

0 

0 

0 

13 

77 

5 

64 

0 

0 

12 

72 

20 

6 

64 

0 

18 

72 

15 

3 

7 

64 

21 

57 

16 

0 

0 

a 

64 

59 

25 

0 

0 

0 

9 

64 

20 

0 

0 

0 

0 


Source. Reproduced from Figure 96, p. 298 o\ Artifacts of Colonial America 
by Ivor Noel Hume. Copyright (c) 1970 by Alfred A. Knopf, Inc. Reprinted by 
permission of Alfred A. Knopf. Inc. 


5.3 Consider Table 5.2. Should an egg measuring exactly 14.75 be counted in 
the class 14.25-14.75 or the class 14.75-15.25? Some textbooks take a 
sophisticated approach to answering this question. It is unnecessary to 
make the matter complicated; it is necessary to make the matter clear. 
One might adopt the convention that the class 14.25-14.75 means up to 
but not including 14.75, the class 14.75-15.25 means up to but not 
including 15.25, etc. Whatever convention is adopted should be stated by 
the person presenting the data. Another way of resolving the question of 
boundaries between the classes is suggested by Table 5.2. If the breadth is 
measured to the nearest half-inch the classes have been selected so that it 
is impossible for a measurement to fall exactly on the boundary. 

These questions do not arise in the data of Table 5.1 and Table 5.4. The 
measurements in these tables are examples of discrete measurements. No 
matter how carefully one counts, the only possible measurements are 0, 
1, 2, 3, 4,.... By contrast, the measurements possible in Tables 5.2, 5.3, 
and 5.5 depend on how finely one measures, and they are called 
continuous measurements. 


5.4 When we construct a frequency distribution for continuous data, 
grouping of the data into classes is required. Grouping may be done with 
discrete as well as continuous data, and the appearance of the frequency 

distribution depends on the number of classes. 

Instead of tabulating the frequency of values in a class, we may record 
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the relative frequency (the frequency divided by the total number of 
observations). Alternatively, we may record the percentage (the relative 
frequency multiplied by 100), as in Table 5.5. 

5.5 It is often helpful to graph discrete frequency distributions as in Figure 
5.1. Here the height of the dot represents the relative frequency obtained 
from Table 5.1. With continuous data it is often more helpful to graph 
the frequency distribution using a histogram. Here the relative frequency 
is represented by areas, and f(x) is the relative frequency multiplied by 
the appropriate factor, so that the total area is one. This is illustrated in 
Figure 5.2 for the data of Table 5.2. The midpoint of each class is called 
the class mark; if one does calculations from the histogram, the class 
mark is used to represent all values belonging to that class. 

5.6 In recent years the general public has become aware of photographs of 
the earth, the moon, Jupiter, Mars, and Venus that have been “enhanced 
by computer technology.” Although the complete process is impressive, 
some of the basic concepts can be described in terms of the elementary 
concepts of this chapter. Scollar [8] gives the picture, prior to 

Reljtive frequency 



0 1 2 3 4 S 

Figure 5.1 Frequency Distribution: Horsekick Data. 
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A(breadth) 


enhancement, in Figure 5.3. By marking off the picture into small square 
areas and recording the shade of gray in each picture, one can build a 
histogram of shades of gray. This is shown in Figure 5.4. The various 
peaks represent the light and dark areas of the picture. Think about what 
can be done to the picture to sharpen the images. It seems quite 
reasonable to make the bright areas brighter and to darken the dark 
areas. This results in the enhanced picture shown in Figure 5.5. 

The process of going from the original picture to the enhanced picture 
amounts to replacing a histogram with little spread by one with a great 
deal of spread. This is the same process that is involved when we turn the 
contrast knob on our television sets. We are spreading out the histogram. 



As scientists constructed frequency distributions and graphs such as 
those presented in this lecture, similarities were noted, and this suggested 
representing distributions by mathematical equations. 

Bortkiewicz noted that a probability formula published by Poisson 
[7] in 1837 gave numbers remarkably close to the observed relative 
frequencies. Using the Poisson formula 



x! 


we can calculate the probability entries in Table 5.6. 
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Figure 5.5 Enhanced Photograph. (Reproduced from World Archaeology, 
10. Scollar, Computer image processing for archaeological air photographs, 
Plate 4. Copyright © 1978. Reprinted by permission of Rheinisches 
Landesmuseum, Bonn.) 


Table 5.6 Relative Frequency Distribution 


Number Killed Relative Frequency 

Probability 

0 

0.545 

0.544 

1 

0.325 

0.331 

2 

0.110 

0.101 

3 

0.015 

0.021 

4 

0.005 

0.003 
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For .X = 0 we get 0.544, for .x = 1 we get 0.331, etc. In a later lecture we 
will say more about the Poisson distribution and explain how 0.61 was 
obtained. For the moment, consider the implication of having a 
mathematical formula. This suggests the possibility of finding a 
generating mechanism—a scientific law. 

The probabilities are also shown in Figure 5.1. 


5.8 The Gaussian probability curve discussed in Lecture 3 was discovered to 
be useful for describing frequency distributions from many varied 
sources. Because of its wide applicability, Pearson called it the normal 
distribution in a Gresham College lecture in 1893. Consider what would 
happen to the histogram in Figure 5.2 if measurements were refined. The 
classes become smaller and smaller, and the boundary of the histogram 
approaches a smooth, bell-shaped curve with equation 



g-(x-^)W 


5.9 The consequences of representing population frequency distributions by 
regular mathematical expressions have been great. The power of 
probability theory has been brought to bear on the development of 
statistical theory. Consequently, statistical methodology rests not on ad 
hoc methods but on rigorous statistical theory. A transition in thinking 
about probability models has taken place among theoretically trained 
statisticians. In the beginning mathematical models were used slowly, 
cautiously, as approximations to the frequency distributions of real 
populations. Today some people regard the probability models as reality 
and more irregular frequency distributions as approximations to reality. 
In my judgment, we should continue to regard probability models as 
approximate descriptions, sometimes remarkably good, of frequency 
distributions in the “real” world. 


SUMMARY. The scientific literature of the late 1800s and early 1900s 
abounded with frequency distributions of all sorts of data, as did the early 
volumes of Biometrika. This lecture presents frequency distributions on 
the number of soldiers killed by horsekicks, breadth of cuckoo eggs, 
ratings of attorney competence, frequency of words used in a work of 
disputed authorship, diameters of clay pipe stems, and shades of gray in 
image enhancement. 
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The grouping of data into classes is discussed. This results in class 
boundaries and class marks, the midpoints of each class. For each class 
the frequency or relative frequency of observations is recorded. 

With continuous observations the choice of classes affects the 
frequency distribution, no matter how finely one measures. With discrete 
observations the possible values are the same, regardless of how fine the 
measurement. 

In many cases the data can be well described by a theoretical 
probability formula. Two examples are given: the Poisson formula for the 
horsekick data and the normal distribution for the egg data. 
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EXERCISES 

1. At a small college the undergraduate English majors intending to 
attend graduate school took the Graduate Record Examination. 
The verbal scores were 552,278,342,675,725,340,482,495,627,539, 
428, 740, 329, 370, 590, and 462. 
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a. Using classes 200-300, 300-400,...,construct the frequency 
distribution. State the convention you use concerning scores that 
fall on the class boundary. 

b. What class boundaries could you use that would give the same 
frequency distribution but would avoid the problem of obser¬ 
vations falling on the boundaries? 


2. An owner of a large, 1975 automobile becomes interested in trading 
for a used, later-model small car and begins to study the used car 
lots. On a large lot of 40 cars, he jots down the ages of the cars on the 
lot. The ages to the nearest year are 3,7,3,4,5,1,2,3,4,5,9,5,6,5,4, 
8,4, 5, 2, 1, 6, 5,6, 3,2, 1, 7,4, 5, 5,4, 3, 2, 3,4, 5, 8, 6, 6, and 5. 

a. Construct the frequency distribution. 

b. Graph the frequency distribution obtained in part a, letting the 
ordinate value represent frequency. 

c. Using classes 0.5-1.5, 1.5-2.5, etc., represent the frequency 
distribution by a histogram so that the areas represent 
frequencies. 


3. In making a study of office operations, the manager becomes 
concerned with the amount of time spent processing incoming junk 
mail. She asks the administrative secretary to keep a record of 
incoming mail. The daily number of pieces of junk mail for a 4-week 
period were 55, 127,89,72,84,77, 115, 130, 116, 120, 140, 125, 131, 
115,99,126,145,124, 89, and 78. 

a. Construct the frequency distribution using the cells 54.5-74.5, 
74.5-94.5, 94.5-114.5, 114.5-134.5, and 134.5-154.5. 

b. Represent the frequency distribution by a histogram and label the 
vertical axis so that frequencies are represented by areas. 



A local car dealer advertises actual gasoline mileage in addition to 
EPA estimates. A group of 24 people who have purchased cars of the 


same model decide to compare their experiences with the advertis¬ 
ing. The mileages (miles per gallon) were 18.7, 21.2, 17.8 15 9 160 
19.0, 18.6, 18.8,20.2,19.4, 19.3, 18.7, 16.1, 17.2, 19.2, 18.7, 19:3; 19 9 
18.8, 18.1,17.7, 20.2, and 20.3. 


a. Using classes 15.05-16.05, 16.05-17.05, etc., construct the 
frequency distribution and histogram. 

b. Does the histogram seem to support the advertised claim of 19 
miles per gallon? 


5. Construct a frequency distribution for the sample obtained in 
Exercise 3 of the previous lecture. Use classes 00-09, 10-19,. 

90-99. Does the frequency distribution seem reasonable for a 
random number table? 
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6. Using a current road atlas, construct a frequency distribution for the 
city sizes in your state. You will need to choose classes so that the 
number of classes is not excessive. 

7. Using the newspaper file in your library, construct a frequency 
distribution for the New York Stock Exchange closing prices for a 
stock of your choice over the past 3 months. 

8. Construct a frequency distribution for the coat sizes found in a local 
clothing store. Note that the clothing rack itself gives a visual 
presentation of a frequency distribution. 

9. Mosteller and Wallace [6] use frequency distributions of basic 
words as a way of studying papers of disputed authorship. Construct 
a frequency distribution for the number of occurrences of the word 
“of” per 100 words in this book. Do the same for some other 
statistics book and compare the distributions. You need only 
examine a few pages in each book, 

10. Using classes 13.75-14.75, 14.75-15.75, ...for the data of Table 5.2, 
construct a frequency distribution and a histogram. How does the 
histogram compare with the histogram of Figure 5.2? 

11. From the data in Table 5.3, what percent of the the judges rated at 
least 70% of the attorneys competent? At least 50% of the attorneys 
competent? 

12. Construct a histogram of clay pipe stem hole diameters for each of 
the time periods given in Table 5.5. 

13. Construct your own image enhancement project. Shade a simple 
picture using only three shades of gray (possibly by using pencils of 
different hardness). Divide the picture, using horizontal and vertical 
lines, into 100 small, rectangular regions. Now build a frequency 
distribution and histogram for the three shades of gray. Do a second 
version of the picture involving five shades of gray, using lighter 
shades for the lights and darker shades for the darks. Construct the 
frequency distribution and histogram for the enhanced picture. 
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MORTALITY TABLES. Births and deaths in preindustrial 
times. Mortality in Preindustrial Times, Gregg International 
Publishers, 1973. From Observations on the Probabilities of the Duration 
of Human Life and the Progress of Population in the United States. 1793, in 
a letter from William Barton to David Rittenhouse. 






MOMENTS AND 
PERCENTILES 


6.1 During the same winter the Plateans, who were still being besieged by the 
Peloponnesians and the Boeotians, began to be distressed by failure oj their 
supply of food, and since there was no hope of aid from Athens nor any other 
means of safety in sight, they and the Athenians who were besieged with 
them planned to leave the city and climb over the enemy's walls, in the hope 
that they might be able to force a passage.... They made ladders equal in 
height to the enemy's wall, getting the measure by counting the layers of 
bricks at a point where the enemy's wall on the side facing Platea happened 
not to have been plastered over. Many counted the layers at the same time, 
and while some were sure to make a mistake, the majority were likely to hit 
the true count, especially since they counted time and again and, besides, 
were at no great distance, and the part of the wall they wished to see was 
easily visible. The measurement of the ladders then, they got at in this way, 
reckoning the measure from the thickness of the bricks* [4]. 


This example of the use of the mode, or most popular value, in the fifth 
century b.c. was cited by Wallis and Roberts [6]. The practice of forming 
some sort of central value for data is certainly very old; Walker [5] 
traces the use of the average to the time of Pythagoras. The concept of a 
central value in data is called central tendency in statistics, and there are 
many measures of the concept; the most usual measures are the mean, 
median, and mode. The mean is the average value, the median is the 


* Reprinted by permission of the publisher. Harvard University Press, from Thucydides, 
Vol. II, translated by Charles Forster Smith, The Loeb Classical Library. Copyright (0 
1965 by Harvard University Press. 
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central value, and the mode is the most popular value. With the numbers 
1, 2, 2, 3, 4, 7, 9, the three quantities are given by: 

Mean = 4 
Median = 3 
Mode = 2 

All three measures are valuable, but the mean is the most widely used. 


6.2 Most schoolchildren are usually taught to calculate the arithmetic 
average, or mean, in the United States during the fourth or fifth grade. 
We wish to relate the elementary definition of the mean to the frequency 
distributions of previous lectures and to the probability distributions of 
the lectures to follow. Therefore we will examine the arithmetic mean in 
minute detail and rewrite it in a way that makes its relation to frequency 
distributions more obvious. Consider a population consisting of the 
numbers 2, 3,2,4,2,3,4,5,3,2. Then the population mean, /r, is given by 

2+3+2+4+2+3+4+5+3+2 

" =-io- 

= 3 

Regrouping the numbers in the numerator, we obtain the alternative 
forms 


(2)(4) + (3)(3) + (4)(2) + (5)(l) 

"-io- 

= 2(4/10) + 3(3/10) + .4(2/10) + 5(1/10) 

We are therefore led to the simple formula: 

= Z xj, 

i^\ 

where x,- denotes the distinct values,/) denotes the relative frequency of 
distinct values, and k denotes the number of distinct values. The 
calculation may be presented in tabular form, as in Table 6.1. 

If these same calculations are carried out on a sample, we denote the 
sample mean by x and regard it as an estimate of the population mean, fi. 
If an individual is selected at random from the small population of the 
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Table 6.1 Calculation of Mean 



/. 

Xji 

2 



10 

10 

3 



10 

10 

4 

-2_ 


10 

10 

5 

J_ 


10 

10 

Total 

1 

II 


preceding example, the relative frequencies become probabilities and the 
formula for the population mean becomes 

= Z 

6.3 By the time of Pascal and Fermat in the seventeenth century, a concept 
more primitive than probability, that of expectation, was already well 
established. In fact, in early papers probability was defined in terms of 
expected earnings. Life expectancy has also been a familiar concept for 
several hundred years since the appearance of the first mortality tables 
for human life. We now want to relate expectation (expectancy, expected 
value) to the mean. Consider Table 6.2. 

Although the table does not contain the information necessary to verify 
its entries, we should recognize the population quantities it estimates 


Table 6.2 Life Expectancy (Years) 


Age 

1900-1902 

1976 

0 

49.24 

72.8 

10 

51.14 

64.2 

20 

42.79 

54.6 

30 

35.51 

45.3 

40 

28.34 

35.9 

50 

21.26 

27.2 

60 

14.76 

19.4 

70 

9.30 

12.9 

80 

5.30 

7.9 


Source. National Center for Health Statistics. 
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and the sort of data that would be required to verify the estimates. For 
example, the entry of 49.24 is an estimate of the average lifetime for all of 
the people born in the United States between 1900 and 1902. 

Conceptually, it would be possible to calculate this number by the 
methods of the previous section once all of the lifetimes have been 
determined and recorded. By multiplying each lifetime by its relative 
frequency and summing, the mean lifetime could be determined. It is 
called the life expectancy at age zero, or at birth. Similarly, the mean 
remaining lifetime could be determined for all the people of age 30 born 
between 1900 and 1902 or of age 20 born in 1976, etc. The life expectancy 
at any age is simply the mean remaining lifetime for all people of that age. 
The figures in Table 6.2 are estimates based on probability models and 
extensive data. 

Expected value has come to be synonymous with mean value; the 
mean value of X is often denoted by E{X) and read as “the expected 
value of X." Since the mean value is not always a member of the 
population, one should be reminded that the expected value is not the 
value we expect; it is the average or mean value. To summarize, 

k k 

/i = £(X) = £ xj, = £ x,P(x,) 

i=l 

Let us repeat the steps of Section 6.2 in order to calculate the expected 
value of X^. Then 


E{X^) 


2^ + 32 + 2^ + 4 ^ + 2^ + 3^ + 42 + 5^ + 3^ + 2^ 

10 

= 22 ( 4 / 10 ) + 32 ( 3 / 10 ) + 42 ( 2 / 10 ) + 52 ( 1 / 10 ) 


We are then led to the formula 

E{X^) = X x.2y;. = X x,2p(x,) 
1=1 1=1 


We easily conclude that 

E(X>) = £ = £ x,^P(x,) 

1=1 1=1 

and, in fact for any function giX) of X, 


k 


k 


Elgixn = Z Mfi = Z 

<=1 
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Follow the details of the foregoing development carefully, because doing 
so will provide an understanding of formulas that are meant to convey 
simple ideas but that can look formidable. 

6.4 In the preceding sections we have introduced the concept of central 
tendency and several measures of that concept, with primary emphasis 
on the mean or expected value. In this section we want to consider the 
concept of variation, or dispersion, and measures of that concept. By the 
late 1800s well-defined measures of variation were in use. Pearson [3] is 
credited with the name standard deviation for the measure defined as the 
square root of the mean of the squares of deviations from the mean of the 
distribution. The symbol a was used to denote the standard deviation. In 
terms of the previous section, 

=y£[(x - 

= ,/l (•<i-(‘)Vi 

V 1=1 

For the example of Table 6.1 we illustrate the calculation of the standard 
deviation in Table 6.3. 

A word of caution may be helpful at this point. Beginning students 
may allow a few symbols to prevent them from trying to grasp the main 
ideas. Without practice one may forget how to calculate a. For the 
nonstatistician a general knowledge of the noncomputational concepts 
may be more viable in the long run. In all populations of numbers there is 
variation. There are many measures of variation; one of the most useful 
measures is the standard deviation. If there is no variation, if all of the 
numbers are equal, the standard deviation is zero. Otherwise it is 
positive. 


Table 6.3 Calculation of Standard Deviation 



fi 

(-^i - 

( •'f. - /dVi 

2 

_ 4 _ 

1 

_ 4 _ 


10 

10 

3 

X. 

10 

0 

0 

4 

X. 

1 

X 


10 

10 

5 

J_ 

10 

4 

10 

Total 

1 


(t'= 1 




a = 1 
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In 1918 Fisher [2] introduced the name variance to denote the 
square of the standard deviation. It has virtually the same properties as 
the standard deviation but has proven to be easier to work with in many 
situations. 

If we do the calculations in Table 6.3 with a sample instead of a 
population, we call the sample variance an estimate and denote it by 
or frequently by = I(.x, — .v)‘/(n - 1). 


6.5 The analogy between the moments of mechanics and the averages of 
statistics was noticed by early workers in the field of probability, and 
Pearson apparently first used the word moment in a statistical sense in 
1893 (see Walker [5, p. 73]). It may seem startling that there is any 
relationship between life expectancy and the center of gravity of a bridge 
structure but, once the two quantities are formulated mathematically, 
the relationship is obvious. 

Just as statistical averaging goes back to antiquity, the use of moments 
in the study of mechanics can be traced to the ancient writers. Cajori [1] 
credits Archimedes with the theory of the center of gravity, called the first 
moment. 

Suppose that k objects of unequal mass are placed on a board and 
balanced on a fulcrum, as in Figure 6.1. The location of the center of 
gravity is easily obtained from the formula 

Z 

i= 1 

“ IM,- 


where 

X, = distance of /th object from the left end of the board 
M, = mass of the ith object 



Figure 6.1 Center of Gravity. 
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Now, if we denote the relative mass by w, = the formula 

becomes 

k 

c = X Xiin, 

i=i 

an expression completely equivalent to the mean or expected value. 

In the construction of structures it is important to have some measure 
of the rotational inertia. A measure in use for hundreds of years is the 
moment of inertia, given by the formula 

k 

Moment of inertia = (.x, — c)^M, 

i = 1 

A related measure is the radius of gyration, given by 

k 

(Radius of gyration)^ = (x, - c)^m, 

1=1 

The analogy between mechanics and statistics can be made stronger 
by considering the graph of the frequency distribution. Graphing the 
frequency distribution of Section 6.2, we have Figure 6.2. 

f 



If we think of the probability as mass, the familiar mean, or average, can 
be thought of as the center of gravity of our frequency distribution 

evradol'’’ TV if'''®''®" ®®" "’"“Sht of as >he radius of 

k * ■ L ®^^ proven helpful to many people. If the 
numbers in the population are scattered widely about the mean, the 

n u ion might be thought of as highly unstable with a large radius of 
gyration, or standard deviation. 
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6.6 Walker considers Gallon to have been the originator of the terms median 
and percentile, although median was actually suggested earlier by Gauss. 
The basic idea of the median is that of dividing the population into two 
equal parts—large numbers and small numbers—and the median is the 
dividing line. There is some indeterminacy with the idea (e.g., consider 
the population 1,1,1,1, l,2forwhich there is no natural division) but, for 
populations of real data, there is little difficulty, and the median is used 
widely. 

Percentiles were developed by Gallon (see Walker) for purposes 
closely related to their modern usage. The basic idea is that we rank the 
numbers of the population from low to high and then divide the 
population into 100 equal parts. Then the ninetieth percentile is a 
number such that 90% of the numbers in the population are less and 10% 
are greater. It may be helpful to note that the fiftieth percentile is, by 
definition, the median. 

6.7 Given only a few numbers, it is doubtful that one should perform any 
calculations. However, frequency distributions and graphs may help one 
to comprehend a large population. Furthermore, parameters such as the 
mean, standard deviation, and percentiles may be of assistance. 
However, it is only with experience that they can be of any great 
assistance in understanding a population of numbers. 


SU M M ARY. Measures of the central value can be traced at least as far 
as the fifth and sixth century b.c. Of the many measures possible, the 
most widely used are the mean, median, and mode. The mode is the most 
frequent value, the median is the middle value, and the mean is the 

average value. 

The expected value of a function of X, g{X), is defined to be Jlg(x)P(x) 
and is denoted by E\_y{X)l This terminology is consistent with the use of 
the term life expectancy to describe the average lifetime of humans. With 
the definition of expectation, the population mean is simply the expected 
value of A, £( X), and is denoted by /i. The sample mean, denoted by X, is 

an estimate of the population mean. 

The most usual measure of population variation is the variance, 

defined by El{X - /i)"] and denoted by The square root of the 
variance, denoted by o, is called the standard deviation. The sample 
variance and standard deviation are denoted by and s, respectively. 
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and are used as estimates of the population variance and standard 
deviation. 

The analogy between the mean and variance and the center of gravity 
and moment of inertia prompts the use of the word moments to describe 
the mean and variance. 

In addition to the mean and variance, percentiles are also used as 
measures of central tendency and variation. 


REFERENCES 

1. Cajori, Florian. (1899). A History oj Physics, New York: Macmillan. 

2. Fisher, R. A. (1918). “The Correlation Between Relations on the 
Supposition of Mendelian Inheritance,” Transactions of the Royal 
Society of Edinburgh, 52, 399-433. 

3. Pearson, Karl. (1894). Contributions to the Mathematical Theory of 
Evolution-1. On the Dissection of Asymmetrical Frequency Curves,” 
Philosophical Transactions, A-185, 71-110. 

4. Smith, Charles Forster. (1920). History of the Peloponnesian War, 
Books III and IV (English translation of Thucydides II), Cambridge, 
Mass.: Harvard University Press. 

5. Walker, Helen M. (1929). Studies in the History of Statistical Method, 
Baltimore: Williams & Wilkins. 

6. Wallis, W. A., and H. V. Roberts. (1956). Statistics: A New Approach, 
Glencoe, Ill.: Free Press. 


EXERCISES 

1. For the population of numbers 8,8,8,8,8,8,8,8,8, and 8, calculate //, 
a^, the mode, and the median. 

2. Show algebraically that the mean of a constant is the constant and 
that the standard deviation is zero. 

3. For the two sets of numbers 10, 8, 7,9, 5, 12, 8, 6, 8, 2 and 20, 16, 14, 
18, 10, 24, 16, 12, 16,4, calculate p and 

4. Show algebraically that if the mean and variance of a set of numbers 
are p and a^, the mean and variance of a set of numbers obtained by 
multiplying each original number by c are cp and c'a'. 
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5. Which of the following sets of data is more variable? Justify your 
answer. 


Set 1 

Set 2 

10 

110 

18 

118 

25 

125 

16 

116 

22 

122 

30 

130 

26 

126 

24 

124 

25 

125 

23 

123 


6. Show algebraically that the mean and variance of a set of numbers 
obtained by adding a constant c to each of the original numbers are 
+ c and 

1. What is the expected value, /i, of a one-digit number generated by a 
random device that assigns equal probability to the integers 0, 1, 

2,...,9? 

8. What is the expected coat size, n, for the frequency distribution of 
Lecture 5, Exercise 8? Do you think this is the expected coat size for 
the potential customers of the store? 

9. Using the data of Table 5.1, estimate /^, the expected number of men 
killed by horsekicks. Is the number used in the equation given in 
Section 5.6? Estimate the standard deviation. 

10. Using the data from Table 5.2, estimate the expected breadth of an 
egg, /i, and the standard deviation, a. Do your calculation with the 

class marks 14.00, 14.50,— 

11. Explain the effect of the brightness and contrast controls on your 
television set in terms of changing the mean and standard deviation. 

12. Verify by example that E{X^) is given by 

E{X^) = I 
/= 1 

13. Repeat Exercise 10 on a coded set of values obtained by subtracting 
1 j.OO from each class mark and then multiplying by 100. Denote the 
values obtained by x' and What decoding must be done to obtain 
the values of x and in Exercise 10? 
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14. Verify algebraically the results implied by Exercise 13; i.e., if 
X' = a + bX, 


fi' = a + 


and 




Note that this result describes image enhancement in that the mean 
of the histogram is shifted and the variance is changed. 

15. Calculate the mean /i and variance for the GRE scores of Lecture 
5, Exercise 1. Also calculate the mean and variance using the class 
marks and frequencies of the frequency distribution. 

16. Calculate the mean and variance for the age of automobiles on the 
used car lot described in Lecture 5, Exercise 2. Perform the same 
calculation using the class marks and frequencies of the frequency 
distribution. Are the results different? Why or why not? 

17. Consider the data of Lecture 5, Exercise 3, to be a random sample. 
Calculate the sample mean and variance of the number of pieces of 
Junk mail per day. Repeat the calculation using the class marks and 
frequencies of the frequency distribution. 

18. Considering the 24 cases of Lecture 5, Exercise 4 as a sample, 
calculate the sample mean, X, and variance, of gasoline mileage! 

Repeat the calculation using the class marks and frequencies of the 
frequency distribution. 
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INTERPRETATION OF 
PROBABILITY 


7.1 Probability has many meanings. Those who use the English language 
are accustomed to words with multiple definitions, where the meaning 
depends on the context. However, it sometimes comes as a shock that 
words used in technical theories are not uniquely defined. How can 
words such as electron, mass, energy, and force have different meanings? 
How can probability have different interpretations? 

The word probability is used basically in two different ways: (I) an 
intrinsic property of some system that does not depend on our knowledge 
of that system, and (2) a measure of belief in some statement. For years a 
controversy has existed among some probabilists and statisticians about 
the proper meaning of the word probability. At other times bitter 
controversy has resulted from a failure to distinguish between the two 
basic meanings. The argument goes back at least to von Mises [8] in 
1919 and Keynes [4] in 1921 and continues to the present. The situation 
is illustrated by two books in elementary statistics. Savage [7] begins his 
book with the view that probability is a quantitative measure of 
uncertainty. Noether [5] presents the degree of belief interpretation as a 
secondary interpretation. The different uses of the word probability lie at 
the heart of the Bayesian controversy, which we discuss in Lecture 31. 


7.2 Scientific theories are built on undefined (or ill-defined) terms. Attempts 
to define mass, force, and acceleration are unsatisfactory. Yet airplanes 
fly, locomotives move, and satellites orbit the earth because of theories 
based on these terms. Electrons have sometimes been described as 
particles, sometimes as waves, sometimes as both, and transistor 
technology advances even though the word electron is not well defined. 
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An analogous situation exists with probability. Statistical methods and 
probability models have proven themselves useful, although probability 
itself has not been well defined. 


7.3 It is useful to distinguish between the concepts we are trying to describe 
and our attempted descriptions (models). Table 7.1 emphasizes this 
distinction. 

If a lecturer asks a class for the probability that a regular coin will land 
heads, the answer will probably be a resounding “one-half.” (Note that 
the previous sentence uses probability both as a property of the coin and 
as a measure of degree of belief.) The answer of one-half is undoubtedly a 
conditioned response; nevertheless, it is clear that the answer is an 
attempt to describe a property of the coin and not a degree of belief. The 
property is the long-run proportion of heads, or long-run relative 
frequency. To illustrate the idea, I tossed a coin 50 times. The results are 
shown in Table 7.2. The number of the toss is shown in the first column 
and is denoted by n. The result of the toss is shown in the second column 
under H or T. The total number of heads at any given toss is shown in the 
third column and is denoted by niH). Finally, the relative frequency of 
heads, calculated by n{H)ln, is given in the fourth column. These results 
are shown graphically in Figure 7.1. Note that after 50 tosses the relative 
frequency has settled down in the neighborhood of one-half. We would 
suspect that if we tossed further, the relative frequency would fluctuate 
above and below one-half but would settle even closer to the long-run 
relative frequency of one-half. 

There are several coin-tossing experiments recorded in the literature. 
A well-known example is given by Kerrick [3]. While interned at Hald, 
Denmark, during World War II, he performed several experiments. For 
example, he tossed a coin 10,000 times and made 5000 draws of two balls 
from an urn. Initially in the coin-tossing experiment the relative 
frequency of heads fluctuated widely but, after 10,000 tosses, it settled 
down in the vicinity of one-half, with a value of 0.507 on the final toss. 

Table 7.1 Interpretations and Models of Probability 
Interpretation Model 


Property of a system 1. Mathematical limit 

(long-run frequency) 2. Collectives of von Mises 

Measurement of degree of belief 1. Logical relation theory 

2. Subjective probability 
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Table 7.2 Relative Frequency of Heads 


n 

H or T 

n{H) 

n{H)/n 

n 

H ox T 

n{H) 

n{H)ln 

1 

H 

1 

1.0000 

26 

T 

12 

0.4615 

2 

H 

2 

1.0000 

27 

T 

12 

0.4444 

3 

H 

3 

1.0000 

28 

H 

13 

0.4643 

4 

T 

3 

0.7500 

29 

T 

13 

0.4483 

5 

H 

4 

0.8000 

30 

H 

14 

0.4667 

6 

H 

5 

0.8333 

31 

H 

15 

0.4839 

7 

H 

6 

0.8571 

32 

H 

16 

0.5000 

8 

T 

6 

0.7500 

33 

T 

16 

0.4848 

9 

T 

6 

0.6667 

34 

T 

16 

0.4706 

10 

H 

7 

0.7000 

35 

T 

16 

0.4571 

11 

H 

8 

0.7273 

36 

H 

17 

0.4722 

12 

H 

9 

0.7500 

37 

T 

17 

0.4595 

13 

T 

9 

0.6923 

38 

T 

17 

0.4474 

14 

H 

10 

0.7143 

39 

T 

17 

0.4359 

15 

T 

10 

0.6667 

40 

T 

17 

0.4250 

16 

T 

10 

0.6250 

41 

H 

18 

0.4390 

17 

T 

10 

0.5882 

42 

T 

18 

0.4286 

18 

T 

10 

0.5556 

43 

T 

18 

0.4186 

19 

T 

10 

0.5263 

44 

T 

18 

0.4091 

20 

T 

10 

0.5000 

45 

H 

19 

0.4222 

21 

H 

11 

0.5238 

46 

H 

20 

0.4348 

22 

T 

11 

0.5000 

47 

H 

21 

0.4468 

23 

T 

11 

0.4783 

48 

H 

22 

0.4583 

24 

H 

12 

0.5000 

49 

T 

22 

0.4490 

25 

T 

12 

0.4800 

50 

H 

23 

0.4600 


n(H)/n 
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The long-run relative frequency is not well-defined, but there is 
something real about it that any of us can experience by tossing a coin. 


7.4 One of the most obvious ways to try to describe the long-run frequency 

property is in terms of a mathematical limit. This is often called the 
limiting frequency definition of probability. If n denotes the number of 
trials and n{H) the number of occurrences of an event H, then the 
probability of H, denoted by P{H), is defined by 


PiH) = lim 

n-* 00 


n 


This definition has proved to be very helpful, although it falls short of 
describing the phenomenon exhibited by coin tossing. If the relative 
frequency of heads approaches one-half as a mathematical limit, there 
would have to be a point where it was close to one-half and would never 
be further away with additional tosses. But we have to admit the 
possibility of an unlimited number of tails at any point. Thus the relative 
frequency of heads may be closer to one-half after 10,000 tosses than 
after 20,000 tosses. This is not allowed by a mathematical limit. 


7.5 A vigorous effort to define probability by limiting frequency was made 
by von Mises. In order to avoid the difficulty of a sequence (such as the 
coin-tossing example) with no limiting frequency, von Mises restricted 
the definition of probability to regular sequences with limiting frequen¬ 
cies. He gave the name collective (kollectiv) to such a sequence. It is not 
generally felt that the work of von Mises resolved the difficulty inherent 
in the limiting frequency definition. As Hacking [1] states, “If there are 
skeptics who insist that the frequency in the long run with which the coin 
falls heads is no property of anything, they have this much right on their 
side: the property has never been clearly defined.” 

7.6 There are basically two theories of probability as a measure of belief: 
logical relation theory and subjective probability. Probability, as pre¬ 
sented by the logical relation theory, measures the extent to which one 
evidence statement supports another statement. This view, first pre¬ 
sented by Keynes in 1921, has been championed in recent years by 
Jeffreys [2], among others. 
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The theory of subjective probability treats probability as a highly 
personal thing, different for each person. The theory relies on betting 
situations to obtain probabilities for each person. The idea of using 
betting in this way was apparently first put forward by Ramsey [6] in 
1931. 


7.7 I can summarize my attitude with the following comments. 

1. There are two main interpretations of probability. 

2. Many arguments would be resolved if different words were used for 
these two interpretations. 

3. It is unlikely that different names for probability will gain general 
acceptance. 

4. Both interpretations are useful. 

5. The relative frequency concept has been of more use in statistics than 
the degree of belief interpretation. 

6. Recent work in Bayesian statistics has made important advances for 
the degree of belief interpretation in statistics. 


SUMMARY. Probability is an undefined term; put another way, we 
say that it has many different meanings. The two most usual interpre¬ 
tations of the term are (1) an intrinsic property of a physical system, and 
(2) a measure of belief in the truth of some statement. 

The major attempt to define probability as an intrinsic property 
regards it as a limiting relative frequency in an infinite sequence. The 
difficulty with this attempt is with the mathematical limit, which lacks 
some of the properties we would desire. Von Mises attempted to avoid 
this problem by restricting the class of infinite sequences to those having 
desirable properties; he called these sequences collectives. 

Probability as a measure of belief has been developed either in terms of 
logical relationships between statements or in terms of the beliefs unique 
to individuals, called subjective probabilities. 

Although no satisfactory definition of probability as a limiting 
frequency has been given, the phenomenon can be observed by making 
many trials, such as tossing a coin, and observing that the relative 
frequency tends to settle down around a limiting value. 

The single word probability continues to be used to describe rather 
different ideas, resulting in some controversy. Fortunately, the rules are 
the same for both interpretations. 
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EXERCISES 

1. To become aware of the problem of interpretation, ask several 
people “What is probability?” 

2. I asked a class of 73 students the following question. “I have tossed a 
regular, U.S. coin 25 times and have obtained 23 heads and 2 tails. 
What is the probability of heads on the next toss? You may vote for 
(«) greater than one-half, (b) less than one-half, (c) one-half.” The 


responses were: 

Greater than one-half 12 
Less than one-half 10 

One-half 44 

Abstentions 7 


What interpretation do you think was being followed by those who 
voted for (a)? For (6)? For (c)? 

3. What interpretation of probability is being used in the reporting of 
weather news? 

4. In the coin-tossing experiment of Table 7.2 can we say with certainty 
that the relative frequency of heads will approach one-half as a limit 
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as n becomes infinite? Can we say that the probability is high that 
such a thing will happen? 

5. What interpretation of probability is being used when odds are 
stated on the outcome of an athletic event? 

6. Toss a die 50 times, recording each time whether a 1 occurs or not. 
Construct a table similar to Table 7.2 for n{l)/n. Then graph the 
results as in Figure 7.1. Has the ratio settled down as much as the 
coin-toss ratio after 50 trials? 

7. What interpretation (or interpretations) of probability is being used 
in each of the statements? 

a. The probability of a meltdown at a specified nuclear power plant 
is .00001. 

b. The probability is .9 that the Notre Dame football team will be 
ranked among the top 10 teams. 

c. The probability is .05 that there is life on another planet. 

d. The probability of an engine mount failure on an airplane is .001. 

e. The probability of an adequate petroleum supply 20 years hence 
is .001. 

f. The probability of buying a new car with a major defect is .007. 

8. Toss a pair of dice 100 times and record the number of double sixes. 
Record n(double sixes)/« and graph as in Table 7.2 and Figure 7.1. 
How many tosses would seem to be necessary in order for the results 
to settle down in the vicinity of ^ 

9. An instructor has become embroiled in a dispute over alleged student 
dishonesty. It has finally been established that the probability that 
thestudentactually took all fourexams (instead of a standin on some 
exams) is .9. Then it is discovered that he did not take the first exam. 

Should the probability that he took the last three be greater or less 
than .9 

10. A friend is willing to bet $ 10 against $5 that team A will win a football 
game. What is his subjective probability that team A will win? 




* 
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RULES OF 
PROBABILITY 


8.1 In the previous lecture we discussed the interpretation of probability, a 
difhcult, philosophical, and controversial matter. Next we take up the 
rules of probability (the calculus of probability). The rules are easily 
stated, nonphilosophical, and almost universally accepted. The manner 
of presentation in vogue in recent years is to state a few axioms and then 
a number of rules that result from these axioms. This is often called the 
axiomatic theory of probability. From our point of view, the axiomatic 
approach completely avoids the important topic of interpretation and 
concentrates on laying the groundwork for computing probabilities. The 
axiomatic approach is possible because there is almost complete 
agreement on the allowable operations with probability. 

We will state some definitions and axioms motivated primarily by the 
relative frequency interpretation, but we will also try to relate these 
axioms to a degree of belief interpretation. 


8.2 Modern set theory provides a convenient vehicle for carrying the 
necessary definitions. The set of all possible outcomes, called the sample 
space, will be denoted by S. Any subset of S will be called an event, usually 
denoted by A, B, C, etc. To illustrate the idea suppose we draw a card 
from the set of cards consisting of the ace, king, and queen of hearts. 
There are three possible outcomes, and we denote the sample space by 

S = {A, K, Q} 

There are eight possible events (subsets) consisting of S, the empty set 0, 

{AyK}, {AyQ}, {KyQ}, {/I}, {K}, and {^}. It may seem strange to the 
student that {/I, K] would be listed as an event when we are drawing only 
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one card. However, we would want to be able to discuss the probability 
that the card is an ace or a king; therefore we need {A,K} as an event. 
Similarly, we need S as an event so that we can discuss the probability of 
an ace or king or queen. The inclusion of the empty set 0 as an event 
does not follow from intuitive considerations. It i§ described as an event 
purely for mathematical convenience. Without its use, some of our 
manipulations would be cumbersome. 


8.3 We now state certain axioms concerning probabilities for any of the 
events associated with a sample space. These axioms are motivated by 
the concept of long-run relative frequency, which lies between zero and 
one. This leads to the first axiom. 


Axiom 1.0^ P{A) ^ 1 for every event A 

Since every outcome belongs to the sample space, the long-run relative 
frequency of S is one. In the coin-tossing experiment of the previous 
lecture, for example, n{H or T)/n = 1 for every n. This leads to the second 
axiom. 


Axiom 2. P(S) = 1 

The next axiom, although not so trivial, follows naturally from relative 
frequency ideas. Often we ask for the probability that Aot B will occur. 
In the example of the previous section we may ask for P (ace or king). In 
terms of relative frequency, it seems apparent that 

n(aces or kings) _ «(aces) ^ n(kings) 
n(cards) n(cards) n(cards) 

This leads to the next axiom. 

Axiom 3. P(A or B) = F(A) H-P(JB) 

if A and B are disjoint (no points In common) 

In terms of degree of belief, the idea is that information should be additive 
if A and B have nothing in common. The set theoretic way of saying that 
A and B are disjoint is to say that AnB = 0 (the empty set). Also, A or 
B may be denoted by A u B. The third axiom is almost always referred to 

as the addition rule. 
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8.4 Using set theory operations, we can derive several important rules from 
the three axioms just stated. We will omit the derivations because the 
rules can be obtained just as directly by thinking about relative 
frequency ideas. However, it may help the student to think of these rules 
as resulting from the axioms, just as theorems of geometry result from 
the given axioms. 

Rule 1. P(0) = 0 
Rule 2. P{A)=\- P{A) 
where A is not A 

To illustrate Rule 2, the probability of getting an ace on one draw is 
^ while the probability of not getting an ace is 1 — 1*3 = jj. 

8.5 Very quickly in the study of probability it seems natural to ask about the 
occurrence of two or more attributes, such as the card being an ace and a 
spade or the card being an ace and red, etc. The probability attached to 
such a joint event is called a joint probability and is denoted by P{AB\ or 
P(A n B) in set theory symbols. To illustrate the idea, consider the 
sample space portrayed in Table 8.1. 

By inspection, we would conclude 

PM) = f! = ii 

P(E) = H = J 
P(AB) = i 
PiAB) = ^ 

P[AB) = ^ 

P(AB) = If 


Table 8.1 Sample Space for Card Problem 



Ace (A) 

Not Ace (/I) 

Total 

Spade(B) 

1 

12 

13 

Not spade (B) 

3 

36 

39 

Total 

4 

48 

52 
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The probabilities involving two events are called joint probabilities. The 
probabilities involving only one event are called marginal probabilities, 
apparently because the marginal probabilities can be obtained from the 
numbers in the margins of tables such as Table 8 . 1 . 

Frequently the sample space is restricted to a subspace. Then the 
probability is conditional on this smaller subspace. When we apply for a 
life insurance policy, the insurance company may condition the prob¬ 
ability of survival on a smaller subspace such as the set of individuals 
with no record of heart disease. In the card-drawing experiment suppose 
we ask for the probability of an ace given a spade. We act as though the 
sample space consisted only of spades, and the desired probability is 13 . 
This is denoted by P{A \ B), the vertical line being read as “given that.” 


8.6 Denote the number of points in A by n(A), in B by n{B), and in AB by 
n(AB). From relative frequency ideas it follows very naturally that 


PiA\B) = 


niAB) 


n{AB) n(S) 
= P{AB)/P{B) 


The only proviso is that P(B) ^ 0. In this manner we also obtain 

P{B\A) = P{AB)/P(A) 


provided that P{A) 9 ^ 0. 


8.7 Provided that the conditional probabilities exist \_PiA) ^ 0 and 
P(B) ^ 0], we can obtain another rule by cross-multiplication. 

Rule 3. P{AB) = PiA\B)P{B) 

= P{B\A)P{A) 

8.8 The rules of the preceding sections arc illustrated by the mortality data of 
Table 8.2. If we let A be the event that a person lives at least to age 50, 

then 


P(^) = .90718 
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Table 8.2 Survivors and Mortality Rate; United States, 1976 


Age 

Survivors per 
100,000 Live Births 

Mortality Rate 
per 1000 Survivors 

50 

90,718 

6.43 

51 

90,135 

7.00 

52 

89,504 

7.62 

53 

88,822 

8.30 

54 

88,085 

9.03 

55 

87,290 

9.77 

56 

86,437 

10.60 

57 

85,521 

11.56 

58 

84,532 

12.71 

59 

83,458 

13.98 

60 

82,291 

15.42 


Source. Reproduced from the Statistical Bulletin. 59. pp. 8 and 
9. Copyright © 1978 by Metropolitan Life Assurance Co. Also 
reproduced from the National Center for Health Statistics. 
Reprinted by permission of Metropolitan Life Assurance Co. 


The probability of dying before age 50 is given by 

P{A) = 1 - P{A) = .09282 

Let B be the event that a person dies between age 50 and 51. Then 

P(B) = (90,718 - 90,135)/100,000 
= .00583 

In this case AnB = B, 
so 

P{AB) = P{B) = .00583 
and 


P(A\B) = PiAB)/P(A) 

= .00643 

The mortality rate at each age, then, is the conditional probability of 
death in the next year given that the person has reached the given age. 
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8.9 The observant student may have noticed with the card-drawing example 
in Table 8.1 that P{A) = P(A \ B) = ^. This simply means that the 
probability of an ace is not affected by knowing that we have a spade. In 
an intuitive sense, the two events are independent. Note that when 
P(A I B) = P{A) or P{B\A) = P{B), Rule 3 gives us P(AB) = P(A)P{B). 
Accordingly, we give a formal definition of independence. 

Definition. A and B are independent if and only if P{AB) = P{A)P(B). 
The multiplication of probabilities is often called the product rule. 

8.10 We still need a rule for P{A or B). This rule is easily developed by looking 
at Figure 8.1. Relative frequency ideas suggest that 

n{A u B) n{A) -I- n(B) — n(A n B) 
n(S) ATS) 

This leads to the following rule: 

Rule 4. P{A or B) = P(A) + P{B) - P(AB) 

As a special case of Rule 4, when A and B are disjoint (AB = 0), then 
P(AB) = 0 and P{A or B) = P(A) + P(B). In probability literature 
disjoint events are said to be mutually exclusive. The addition of 
probabilities of mutually exclusive events is usually called the addition 
rule. 



Figure 8.1 Venn Diagram for A or B. 

8.11 One particular way of writing a conditional probability has drawn 
extensive comment. Known as Bayes’ rule or Bayes theorem, it simply 
expresses one conditional probability in terms of other conditional 
probabilities and marginal probabilities. 
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Bayes' Rule (Rule 5) 


P(B\A) = 


P(A I B)P{B) 

P{A 1 B)PiB) + P{A I B)P{B) 


Example. The authorship of books is often a controversial issue among 
scholars of classical works, and one statistical investigation of a work of 
disputed authorship was referred to in Lecture 5. Suppose that we have a 
collection of works, some of which are authored by Paul, and that we 
have a test for authorship. Let A = indication that Paul authored a given 
work and B = event that the work was actually authored by Paul. 
Suppose we have great confidence in our test so that 


P(A\B) = .95 and P(A\B) = A0 


If the test indicates for a given work that Paul is the author, we would be 
interested in the probability that Paul actually is the author given that 
the test indicates so. That is, we^re interested in P(B\ A). Using Bayes’ 
rule, we obtain 


P(B\A) = 


(.95)P(B) 

(.95)P{B) + (.lO)P(B) 


In order to go further, we must have P{B). If P(B) is small, say .10, 


P(B\A) 


.95 

.95 + .90 


= .5135 


On the other hand, if P(B) is large, say .90, 

P(BM) = .9844 

In both cases, an indication by the test that Paul authored the article 

gives a conditional probability larger than the marginal probability. 

However, P(B \ A) is not as large as might have been anticipated if P(B) is 
small. 


12 In this chapter we have presented the basic rules for probability 
calculations. Some of the rules are intuitive; others are not. In the 
application of even the more intuitive rules there come times when the 
answers are startling. Probability results obtained from meticulous 
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application of stern rules often do not agree with our intuition, forcing us 
to sharpen our definitions and to examine our logic. 

In the following lecture we will give attention to the problem of 
probability calculations. 


SUMMARY. The axiomatic theory of probability takes probability as 
an undefined quantity satisfying certain axioms; then rules and theorems 
are obtained from the axioms. 

The set of all possible outcomes, denoted by S, is called the sample 
space, and any subset /I of S is called an event. In the case of two events A 
and B, the probability of either is called a marginal probability, the 
probability of both, denoted by P(AB), is called a joint probability, and 
the probability of A with the sample space restricted to the points in B is 
called the conditional probability of A given B, P(A \ B). 

The joint probability can be obtained from the product of a 
conditional probability and a marginal probability. That is 

P(AB) = P(B\A)P{A) 

When the Joint probability equals the product of marginal probabilities, 

the events are called independent. 

The probability of either >1 or B is given by 

P{A or B) = P(A) + P(B) - P{AB) 

When A and B are disjoint, they are called mutually exclusive, and the 
probability of /I or B is simply the sum of marginal probabilities. 

Bayes' rule allows the calculation of P(B| A) given P(A \ B), P(A \ B), 
P(B), and P{B). That is, it gives one conditional probability in terms 
of another conditional probability and the appropriate marginal 
probabilities. 
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EXERCISES 

1. A ball is drawn from an urn containing four balls numbered 1,2,3,4. 
Give the sample space and all of the 16 possible events. 

2. Two balls are drawn from the urn of Exercise 1, one at a time, 
without replacement of the first ball before the second ball is drawn. 
List the six possible outcomes in the sample space and the 64 
possible events. 

3. For the sample space in Table 8.1 calculate 

P{A I B), P(A I B), P(B I A), and P{B | .4). 

4. From the data given in Table 8.2 verify the mortality rates for ages 50 
to 59. 

5. From the data of Table 8.2 calculate the probability of surviving 
until age 61. 

6. What interpretation of probability is being used in probability 
statements based on Table 8.2? 

7. An urn contains three balls numbered I, II, and III. Two balls are 
drawn at random from the urn, one at a time, without replacement of 
the first ball, before the second ball is drawn. 

a. List the six points in the sample space, each point specifying an 
outcome for the first and second ball drawn. 

b. Let A be the event that ball I is drawn on the first draw and 
determine n(A). What is P(A)7 

c. Let B be the event that ball I is drawn on the second draw and 
determine n{B). What is P(B)1 

d. Let C be the event that ball II is drawn on the first draw and 
determine n(C). What is P(C)? 

e. What is P(AC)7 

f. What is P(/1|C)? 
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8 . Work Exercise 7 supposing that the first ball drawn is replaced 
before the second ball is drawn. For part a there will be nine points in 
the sample space. 

9. Suppose that in Exercise 7, balls I and II are white and III is red. Let 
A = event that first ball drawn is white and B = event that second 
ball drawn is white. Determine: 

a. P(A). 

b. P{B). 

c. P(AB). 

d. P(A I B). 

e. P(B\A). 

10. Two cards are drawn (without replacement) at random from the four 
aces of a bridge deck. 

a. List the 12 points in the sample space. 

b. Let A be the event of getting the ace of spades. List the points in A 
and determine PiA). 

c. Let B be the event of getting the ace of hearts. List the points in B 
and determine P(B). 

d. Determine PiA \ B), the probability of the ace of spades given the 
ace of hearts. 

e. Determine the probability of the ace of spades given that we 
obtain at least one red ace. 

Show that 0 is independent of any event A. Also show that 0 and A 
are mutually exclusive. 

A fair coin is tossed twice. Let A be the event of heads on the first toss 
and B be the event of heads on the second toss. Show that A and B 
are independent but not mutually exclusive. 

If we are told that two events are independent, can we conclude that 
they are not mutually exclusive? 

P(A or B) = -^ 

PiA) = i 

PiB) = i 

Are A and B mutually exclusive? 

15. P(A\B) = }; P(B) = i 

Find PiA) and PiB). 


11 . 

12 . 

13. 

14. 
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16. We are given that 

PiA) = i P(B) = ^ 

P(C) = -h Pi.AB) = -k 
P{AC) = ^ P{BC) = ^ 

P(ABC) = 

a. P{A or B or C) = ? 

b. Sketch the Venn diagram showing probabilities for this problem, 

17. We are given that 

P{A) = P{B) = P(C) = \ 

P(AB) = P(AC) = P{BC) = i 
P(ABC) = 0 

a. Sketch the Venn diagram and show the probabilities for this 
problem. 

b. Show that A, B, and C are pairwise independent but not jointly 
independent. 




PASCAL'S CALCULATING MACHINE. Courtesy of IBM. 
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COMPUTATION OF 
PROBABILITY 


9.1 The so-called classical definition of probability states that if an experi¬ 
ment can result in n equally likely outcomes and if n{A) of these outcomes 
possess the property A, the probability of A is given by 



Of course, this is not really a definition because it contains the term 
“equally likely,” which presumes a meaning of probability. Nevertheless, 
it does provide a useful way of formalizing our thinking about 
probability. In this lecture we will present a number of examples where 
all points in the sample space are taken to be equally likely. 


Example 1 . An experiment consists of drawing two balls in succession, 
without replacement of the first ball, from an urn containing two white 
balls and one red ball. We are asked for the probability that the second 
ball is white. For this example, let us number the balls with the 
corresponding colors as follows. 


Ball 

Color 

1 

W 

2 

W 

3 

R 
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The sample space in terms of ball number is as follows. 


Outcome 

First Ball 

Second Ball 

1 

1 

2 

2 

1 

3 

3 

2 

3 

4 

2 

1 

5 

3 

1 

6 

3 

2 


The event A, consisting of the outcomes where the second ball is white, is 
the following. 


Outcome 

First Ball 

Second Ball 

1 

1 

2 

4 

2 

I 

5 

3 

1 

6 

3 

2 


Thus the probability that the second ball is white is n{A)ln = % = \. This 
answer may be counter to the intuition of students, who uncon¬ 
sciously think in terms of conditional probability. 

In this example it is a simple matter to list all of the points in the 
sample space and to count the points belonging to the event A. For more 
complex problems the counting may become an onerous task unless we 
make use of counting formulas and counting algorithms. 


9.2 Mathematics abounds with fascinating counting problems, and the 
solution to some of these problems requires the highest ingenuity. 
However, many problems can be solved with elementary combination 
and permutation formulas. The arrangements of balls of Example 1 are 
called permutations because the order is important. That is, the 
arrangement 1,2, 3 is considered to be different than 3,2,1. By contrast, 
the arrangement of cards in a bridge hand is a combination because the 
order of cards is not important—the value of the hand not depending on 
the sorting. It was mentioned in Lecture 2 that the word permutation 
was first mentioned by Bernoulli. The word combination was first used 
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in the modern sense in 1666 by Leibniz in his “Dissertation Concerning 
the Combinatorial Arts.” 

The number of combinations of n things taken r at a time is denoted by 
and is given by 

\rj r!(n - r)! 

where n ! (n factorial) is given by n! = n{/i — 1 ) (n — 2 )... 1 with the usual 
convention that 0 ! = 1 . 

The number of permutations of n things taken r at a time is given by 
P(n,r) =/i!/(n - r)!. To illustrate the use of this formula, consider 
Example 1. The number of points in S is obtained from P(3,2) = 3!/l! 
= 6 . 


Example 2. An experiment consists of dealing 13 cards from a well- 
shuffled bridge deck. What is the probability that the hand contains the 
ace of spades and the ace of hearts? To solve this problem, note that we 
need to calculate the ratio n(A)/n, and the major problem is a counting 
problem. Now n is simply the number of ways of choosing 13 cards from 


the deck. 


52N 


The n(A) numerator requires a little more thought. In 


every hand containing the two specified aces, there are 11 other cards 


that can be selected in 



ways. Therefore the probability is 


/50\ /52\ / 50! \/l3!39!\ 

Vliy ■ V3y/“Vll! 39!j V 52! J 

_ (I3)(12) _ 1 
(52)(51) 17 


9.3 To summarize these ideas, many of the simple problems can be 
conceptualized by thinking of the list S of all the possible outcomes 
judged to be equally likely and of the subset A in this list. The desired 
probability is then obtained by forming the ratio of the number of points 
in A, n(A\ to the number of points in S,n. At this point the need arises for 

counting devices; this need is partially met by combinations and 
permutations. 

Confusion usually arises for beginning students (and for me, for that 
matter!) about whether to use combination or permutation formulas. 
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This confusion can be avoided if we think carefully about the points in S, 
the sample space. If they are permutations, naturally we use permutation 
formulas; if they are combinations, we use combination formulas. 


9.4 The famous birthday problem is concerned with the probability that 
when r people are selected at random from a large population, at least 
two have the same birthday (see Feller [2] and Munford [3] for a recent 
article). Ignoring leap years, the sample space is the set of all possible r- 
tuples (x,, Xj,..., xj, where each of the xs can equal 1,2,3,..., 365. Then 
the number of points in the sample space is 

n{S) = (365)' 

Let A be the event that at least two people have the same birthday; A is 
the event that all birthdays are different. A is the set of all pjermutations of 
365 dates taken r at a time. Thus 

niA) = P(365, r) 


Then 


P{A) = P(365,r)/(365)' 


and 

P{A) = 1 - P{A) 

We can evaluate P{A) for various values of r. For r = 23, P(A) exceeds 
one-half. Thus, for 23 or more people, the odds are in favor of at least two 
people having the same birthday. It has been shown by several people 
(e.g., Munford [3]) that if all birthdays are not equally likely, P{A) is 
increased. 


9.5 The Society for Psychical Research was founded in London in 1882 
because of the conviction there was a need for scientific study of 
paranormal events (odd and unexplained kinds of events). The term 
parapsychology is used to describe experimental psychical research. A 
review of experimental investigations of extrasensory perception is given 
by Thakur [4] and Thouless [5]. Since the early card-guessing 
experiments, many experiments have been devised to test the hypothesis 
that extrasensory perception does exist. Regardless of the outcome, there 
is always the question of whether the results could be explained by 
chance. In fact, one of the chapters in the book by Thouless is entitled 
“Can the Apparent Successes be Explained by Chance? 
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A psychic deals the face cards from spades and hearts, face up, in a 
random order, looking at the cards and concentrating on their color. In 
another room a colleague who knows there are four red and four black 
cards tries to guess the color of the card drawn on a signal from an 
impartial observer that a card has been dealt. If six out of eight cards are 
correctly identified by color, would you feel that this supports the claim 

of telepathy? 

The possible outcomes of the experiment are the possible records of 
eight guesses with four red and four black. We can find the number of 
points in the sample space, n(S), by thinking of the number of ways that 
we can assign red to four of the guesses (black going to the other four). 
This is simply the number of combinations of eight things taken four at a 
time. So 



Let us suppose that all 70 outcomes are equally likely and find the 
probability of identifying correctly six or more cards. 

In order to identify correctly six cards by color, three red cards and 
three black cards must be identified. The number of ways to identify 


correctly three red cards is 



and for each of these ways, three black 


cards can be identified in 



ways. So the number of ways to identify six 


cards is the product 



= 16. Since there is only one way to 


identify correctly all eight cards, the probability of identifying correctly 
six or more cards is 

Even if the colleague is guessing, there is almost a 25% chance of 
identifying correctly six or more cards in this experiment. 


9.6 According to Duncan [1], it was only in the 1920s that statistical theory 
began to be applied effectively to quality control. Walter A. Shewhart of 
the Bell Telephone Laboratories began the work and in 1931 published a 
book entitled Economic Control of Quality of Manufactured Product. 
Following Shewhart’s visit to London in 1932, the development of 
statistical quality control in Britain was swift. Although the development 
lagged in the United States, it received a major impetus during World 
War II. Duncan reviews the development of acceptance sampling plans 
from the Dodge-Romig Tables up to the present. 
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The simplest acceptance sampling plan works as follows. From a 
manufactured lot of N items we take a random sample of n items and 
accept the lot if the number of defective items is less than or equal to a 
specified number, c. For example, we might accept a lot of 10 items if we 
found three defectives or fewer in a sample of size 5. 

Anyone can devise an acceptance sampling plan. The next problem is 
to determine the properties, or operating characteristics, of the plan 
devised. We might try to determine the probability of accepting lots of 
varying quality. If the lot of 10 items contains four defective items, what 
is the probability of accepting the lot? 

The number of points in the sample space is the number of 


combinations of 10 things taken five at a time. 



252. The number 


of samples with three defectives or fewer is 






= 60 +120 + 60 + 6 
= 246 


Therefore the probability of accepting the lot containing four defective 
items is = .976. 


Probability of acceptance 



Figure 9.1 Operating Characteristic Curve. 


Percent defective 
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Similarly, we can calculate the probability of acceptance for any 
number of defectives in the lot and plot the probability of acceptance 
versus percent defective in the lot. Such a curve is called an operating 
characteristic curve (OC curve), and the OC curve for the present 
example is given in Figure 9.1. 

9.7 With the very high standards required by the development of missile 
systems and space travel, extensive development of reliability engineer¬ 
ing has occurred in the last 25 years. Suppose three electrical com¬ 
ponents A, B, and C are connected in series, as in Figure 9.2. 



Figure 9.2 Components in Series. 


In order for the system to work all three components must function. If 
the component probabilities are = .95, pg = .90, and pc = .99, the 
probability of the system working (the reliability) is (.95) (.90) (.99) 
= .84645. One way of increasing the system reliability is to build a 
parallel system, as in Figure 9.3. Then the system reliability is 



II 

Figure 9.3 Parallel Systems. 

P(I or II) = P(I) + P(II) - P(I and II) 

= .84645 -I- .84645 - (.84645)^ 
= .97642 

Alternatively, we can use parallel circuits, as in Figure 9.4. 
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Input 



Figure 9.4 Parallel Circuits. 


For the system in Figure 9.4, the system reliability is 

P(I and II and III) 


But 


P(I) = .95 + .95 - (.95)2 ^ 9975 
P(II) = .90 + .90 - (.90)2 ^ 99 


and 


P(III) = .99 + .99 - (.99)2 ^ 9999 

Therefore the system reliability is .98743. In this case we see, therefore, 
that the system in Figure 9.4 is more reliable than the one in Figure 9.3. 


9.8 We often have opportunities to make use of simple tree diagrams in the 
solution of probability problems. We will not give a general description 
of a tree diagram, but we will give an example. Suppose that we are asked 
for the probability of exactly two aces in the draw of three cards from a 
bridge deck. We can imagine the cards being drawn sequentially, and the 
possibilities on each draw are ace or not-ace. We can then visualize the 
possible outcomes as in the tree in Figure 9.5. The probabilities shown 
on the branches are conditional on the previous draw (draws) and 
should be verified. The probability of climbing from the base to any of 
the branches is obtained by multiplying the probabilities. Thus the 
probability of climbing to branch 1 is obtained by 

± i- 2 _ 1 

13 51 50 (13)(17)(25) 

The probability of getting exactly one ace in three draws is the sum of 
probabilities for branches 4, 6, and 7 and is given by 0 (|?) (ss) + 

(fl) (A) (1^) + (H) (fJ) {U 
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Third 

draw 



9.9 In conclusion, we should mention that obtaining a solution to a simply 
stated probability problem can be extremely difficult and students may 
be discouraged by initial difficulties. These difficulties should not be 
allowed to obscure the conceptually simple process involved. 


SUMMARY. In many probability problems it seems reasonable to 
assume that all points in the sample space are equally likely, or equally 
probable. Then the probability of an event, /I, is simply the number of 
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points in A, n(A), divided by the number of points in 5, n. That is, P{A) 
= n(A)/n. This formula is known as the classical definition of probability, 
but it is not a definition at all. In these problems the calculation of a 
probability is essentially a counting problem. The number of points in A 
and in S must be counted. 

For purposes of counting, the combination and permutation formulas 
are invaluable. Once the points in S have been listed or even partially 
listed, it will be clear whether the points are permutations or 
combinations. 

The calculation of probabilities is illustrated with several examples, 
including an urn problem, the birthday problem, extrasensory perception, 
quality control, and reliability theory. 

Tree diagrams are also useful for obtaining probabilities, particularly 
for events consisting of sequential trials. 
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EXERCISES 

1. Evaluate the probability that two out of r people have the same 
birthday for r = 5, 10, 15, 20, 23, 25, 30, 35. Graph the probability 
versus r. In several of your social groups inquire about people with 
matching birthdays. Are the results consistent with the theoretical 

answers? 

2. Conduct the color-guessing experiment described in Section 9.5. 
Does your experimental data support the mental telepathy 

experiment? 
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3. Verify the OC curve of Figure 9.1 by calculating the probability of 
acceptance for p = . 1, .2,.... .9. 

4. Show that 




Show that 





6. Determine the OC curve for an acceptance sampling plan that 
accepts the lot with two defective items or fewer in a sample of four 
items from a lot of size 15. 

7. A 13-card bridge hand is drawn at random from a well-shuffled 
bridge deck. 

a. Let A be the event that the hand contains two aces. What is P(A)7 

b. Let B be the event that the hand contains three aces. What is 

P(B)? 

c. What is the probability that the hand contains four aces? 

8. To continue with Exercise 7, suppose we are not interested in bridge 
hands but in the outcome for the first card drawn, the second card 
drawn, etc. 

a. Are the points in the sample space combinations or 
permutations? 

b. What is the number of points in the sample space? 

c. Let A be the event that the first card drawn is a spade. Determine 
n(i4) and P(A). 

9. If each of the components in Figure 9.2 has a reliability of 0.99, what 
is the system reliability? 

10. With 10 components in series as in Figure 9.2, what must the 
component reliability be in order for the system reliability to be 
0.90? 

11. With three components each of A, B, and C with component 
reliabilities, respectively, of 0.95,0.90, and 0.99, how can we achieve 
the greatest system reliability? Should we use three parallel systems 
as in Figure 9.3 or should we use the components in parallel as in 
Figure 9.4? 

12. Users of books in a library are asked not to reshelve their own books. 
If a book is borrowed from a shelf holding 50 books and returned to 
a random place on that same shelf, what is the probability that it will 
be in the proper location? 

13. If two of the books (see Exercise 12) are returned to random places, 
what is the probability that both books are in the proper location? 
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14. A two-stage acceptance sampling plan for lot size of 10 proceeds as 
follows. An initial sample of size 3 is drawn. If there are no defective 
items the lot is accepted. If there are two or three defective items the 
lot is rejected. If there is only one defective item, another two items 
are drawn. If no additional defective items are found the lot is 
accepted. Otherwise it is rejected. Determine the OCcurve for/? = . 1, 
.2,..., .9. 

15. A coin is tossed five times. Calculate the probability of two or more 
heads. 

16. A woman living outside a large city drives her car to the nearest 
railroad station, parks it there, and takes the train into town to 
attend the opera. Once there she becomes concerned that she may 
have left the headlights on. If so, the car battery will be down by the 
time she gets back to the station after the op>era. She estimates that 
the probability that the lights are on is .9. She further estimates that 
the probability of being able to go back to the station, check the 
lights, and return in time for the opera is .5. What should she do if she 
wishes to maximize the probability of getting home in her car and 
attending the opera? 

17. A homeowner’s lawn mower has become worn and sometimes will 
not start again as long as the motor is hot. To mow his lawn, he must 
kill the motor and empty the grass catcher four times. The 
probability that the mower will start again if he kills the motor is .9, 
and the probability that he can mow the whole lawn without 
running out of gasoline is .8. If he leaves the mower stopped long 
enough to refill it, the probability of its starting again is .7. Should he 
plan on refilling the mower on one of the stops or not? 

18. A car owner is debating on whether or not to order spark plug wires 
from an auto parts catalog. He judges that the probability that new 
wires will cure his ignition problems is .3. If spark plug wires are not 
the problem, he plans to take the car to a garage for a tune-up. What 
should he do to minimize his expected cost if the spark plug wires 
cost $18 and the tune-up will cost $80? 

19. It cost $9 for each spraying of a large, papershell pecan tree. If all 
three sprayings are at the right time, the crop will be 100 pounds, if 
there are two sprayings at the right time, it will be 60 pounds; if there is 
one spraying at the right time, it will be 30 pounds; if the tree is not 
sprayed, there is no crop. The probability for each spraying being at 
the right time is .9. What is the expected net return if pecans are worth 

$2 per pound? 
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10.1 And strange to tell, among that Earthen Lot 
Some could articulate, while others not: 

And suddenly one more impatient cried — 

""Who is the Potter, pray, and who the PotT'* 

Edward Fitzgerald’s translation of the Rubaiyat of Omar Khayyam [2] 
seems appropriate as a starting point for a discussion of the binomial 
distribution. The binomial expansion and the binomial coefficients have 
been known to mathematicians for hundreds of years but have been 
rediscovered again and again until it seems impossible to say who is the 
“potter” and who is the “pot.” Interestingly, Omar Khayyam himself 
may have been familiar with the binomial expansion and coefficients. 
Before giving a few traces of the origin of the binomial expansion, we 
should review a few facts about the expansion itself. 

10.2 In elementary algebra courses students learn that 

(a + b)^ = a^ + lab + b^ 
and 


(a + b)^ = a^ + 3a^b + 3ab^ + b^ 

By multiplying (a + b) times the expanded form of (a + h)\ we obtain 

{a + b)^ = + 4a^b -F 6a^b^ + 4ab^ + b* 


* Repnnted by ^rmission of the publishers George Allen & Unwin Ltd. from The 
UdHonlot Copyright © 1959 by George Allen & Unwin 
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At this point students may be motivated to memorize some rule to 
generate the terms of the expansion more easily. It was observed long 
ago that the sum of the exponents is constant from term to term. That is, 
if Ca'^b^' is a term in the expansion of {a + 6)", x + y always equals n. 
What is needed is a simple expression for the coefficient. It was also 

discovered long ago that the coefficient of the term involving was 

the number of combinations of n things taken x at a time. Then the 
binomial expansion can be written as 

(a + i.r= i 



10.3 Pascal’s triangle provides a simple device for generating the binomial 
coefficients; the rows of the triangle correspond to different values of n. 
The elements in each row are the coefficients of the terms in the 
expansion for that value of n. The enticing prop>erty of the array is that 

I any coefficient (other than 1) is the sum of the two coefficients to the 

immediate left and right in the row above it. So, for n = 8, we would 
obtain the coefficients 1,8 = 1 + 7,28 = 7 + 21,56 = 21 + 35,70 = 35 
+ 35, 56 = 35 + 21, 28 = 21 + 7, 8 = 7 + 1, and 1. 

According to Hogben [4, p. 162], Khayyam may have been familiar 
^ with the binomial expansion as early as ICKX) A.D. In Lecture 2 we called 

attention to the reference by Smith, tracing the triangle to a Chinese 
' writer as early as 1303. Eves [1] points out that the binomial coefficients 
^ / were given by Michael Stifel in Arithmetica integra, published in 1544. 

__ Pascal’s name seems to have been firmly attached to the triangular array 

) of coefficients about 1665, although a triangle-type array was also given 

by Bernoulli in the 1713 Ars Conjectandi. 

10.4 The binomial probability distribution is often attributed to Bernoulli 
(e.g., see Kendall and Buckland [5]). In view of the murky origins of the 
binomial expansion, such a singular source for its use in probability 
seems suspect. However, it is clear that Bernoulli did use the binomial 

expansion for probability problems. 

The binomial expansion arises from consideration of the following 
problem. Several (n) independent trials are to be performed with two 
possible outcomes (a success or a failure) on each trial. If the probability 
of a success on each trial is p, what is the probability of exactly x 
successes? For example, in five (n = 5) tosses of a die, what is the 
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Table 10.1 Illustrations of Binomial Probabilities* 



Trials 


Probability of 


Event 

1 

2 

3 

4 

5 

riuuauiiiiy 

Event 

1 

4 

4 

4 

a 

a 


2 

4 

4 

a 

4 

a 

iiriiv 

3 

4 

4 

a 

a 

4 


4 

4 

a 

4 

4 

a 


5 

4 

a 

4 

a 

4 


6 

4 

a 

a 

4 

4 

{DHiy 

7 

a 

4 

4 

4 

a 

{^yny 

8 

a 

4 

4 

a 

4 

Qyny 

9 

a 

4 

a 

4 

4 

i^yny 

10 

a 

a 

4 

4 

4 

i^yiiy 


*"Not four." 


probability of exactly three (x = 3) fours? The 10 mutually exclusive 
events with their associated probabilities are given in Table 10.1. 

Now the probability of exactly three fours is the sum of the 
probabilities of the mutually exclusive events producing exactly three 
fours. So 

p{x = 3 ) = 


Every line in Table 10.1 has the same probability. If we can ascertain the 
number of lines we do not need to list the events. It is easily seen that the 
number of lines is simply the number of ways that we can pick three 


positions from the possible five 


. The desired probability is then 



Pix = 3 ) = 

Reasoning in this way we can conclude that the probability of exactly 
X successes in n trials is generally 
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It is quickly noted that this expression resembles one of the terms of a 
binomial expansion. It is, in fact, one of the terms of the binomial 
expansion of {p + qf. For this reason it is known as the binomial 
probability distribution or, more simply, the binomial distribution. 

It should also be pointed out that we could have arrived at the 
binomial probability formula by use of a tree diagram, as in Lecture 9. 
The tree would have 2" branches, and the probability for x successes is 


the sum of the probabilities associated with 



branches. 


10.5 Tables of the binomial distribution have existed for many years. Table 
A.2 gives values up to n = lOfor several values of p. For example, we can 
read directly from the table that the probability of three successes from 
eight trials with p = 3 is .2731. If we throw a die eight times, the 
probability of getting a one or a two on three throws of the die is .2731. 


10.6 


Gridgeman [3] uses the binomial distribution to analyze what is 
surprising about a family of 11 children—five girls followed by six boys. 
He refers to this example, which is given in the Life Science Library, as a 
very rare grouping with a probability of joV a- Gridgeman points out, 
any family of 11 children (with boys and girls in any specified order) has 

exactly the same probability—( 3 )’ ‘ = 

Given that there are 11 children, it is not surprising that there are five 


boys and six girls, because the five-six split is more likely than any other. 
However, given five boys and six girls, the probability of five boys 

followed by six girls or six girls followed by five boys is 23 T = 2 /( . 1 , an 


unlikely event but with considerably higher odds than 2048- 


10.7 In Lecture 8 an experiment in color guessing was described for which the 
probability of identifying correctly six or more cards was 4^. Suppose the 
experiment were repeated 10 times, with six or more cards identified 
correctly 8 out of the 10 times. Is this surprising? With n = lOandp = 4o^ 
we want to find P{x = S) + P(x = 9) -H P{x = 10). Since ^ is ap¬ 
proximately .25, let us use Table A.2 to find this probability. We see that 
it is approximately .0004 + .0000 -f- .0000 = .0004. Certainly this sort of 
success would be so unlikely to have occurred by chance that it would 
give strong support for mental telepathy. 
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10.8 The mean and variance are easily found by algebraic methods to be 


and 


fi = np 



These moments also agree with our intuitive interpretations, as de¬ 
scribed in Lecture 6. The mean is the average or expected number of 
successes, which is just the number of trials multiplied by the probability 
of success. This calculation is almost a subconscious one in many cases. If 
we toss a coin nine times we would say the expected number of heads is 
^[np = (9)(^)]. The variance is not quite as intuitive. However, we 
can see from Table A.2 that the variance increases as n increases and as p 
approaches one-half. A little thought will convince us that p and q should 
enter equally into the variance formula so that the expression npq is 
reasonable. 

To indicate how the algebraic derivation goes, let n = 2. The 
frequency distribution is as follows. 


Then 


and 


X 

/ 

0 

(1 -P)' 

1 

2p(\ - p) 

2 



p = LfiXi 

= 2p(l - p) + 2p^ 
= 2p 


= (0 - 2pfq^ + (1 - 2p)^2pq + (2 - 2p)2p2 
= 4p^q^ + (1 — 4pq)2pq + 4p^q^ 

= '^P<i(4pq -t- 1 - 4pq) 

= 2p^ 
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10.9 We have shown how the binomial distribution arises from the concepts 
of the previous lectures, given a few of its properties, and illustrated its 
use in calculations. Do not be misled into thinking that it is useful only 
for card-drawing experiments. Knowledge of the binomial distribution 
has applications in genetics, acceptance sampling, reliability, and many 
other areas. 


SUMMARY. The binomial distribution derives its name from the 
familiar binomial expansion of (a -t- by. The binomial coefficients of this 

expansion have the form 

When n independent trials are performed with success or failure 
possible on each trial and a constant probability of success p, the 
probability of x successes is given by 



P{x successes) = 




X 


This is one of the terms in the binomial expansion of {q 4- p)". 

The mean and variance of the binomial distribution are np and 
np{ 1 — p), respectively. 

For large n the calculation of binomial probabilities becomes tedious, 
but tables of the distribution and pocket calculators have eliminated 
most of the difficulties. 

The binomial distribution is illustrated with several examples. 
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EXERCISES 


1. Show that the following four binomial expansions of {a + b)^ are 
equivalent. 




2 . Construct Pascal’s triangle to include n = 2, 3, 10 and identify 

the rows that correspond to particular values of n. 

3. Construct a tree diagram to describe the possible sequences of 
outcomes for four independent trials. Identify the branches cor¬ 
responding to zero successes, three successes, and four successes. 

4. A coin is tossed 10 times. Use the binomial formula to find the 
probability of exactly five heads. Use Table A.2 to calculate the 
probability of at least eight heads. 

5. A student takes an examination for which he has not prepared. He 
guesses at all of the 10 items on a true-false question. What is the 
probability of getting at least eight correct answers? 

6 . A die is thrown six times. What is the probability of exactly one six? 

7. Two dice are thrown six times. What is the probability of exactly one 
pair of sixes? 

8 . Ten percent of the books in a certain library have never been 

borrowed. If you select five books at random from the library, what 

is the probability that at least two of them will not have been 
borrowed previously? 

9. Ten repetitions were made of an experiment in which a clairvoyant 
tries to guess the color of each card dealt (as described in Lecture 9). 
The numbers of cards correctly identified are 8,6,4,6,2,6,4,6,8, and 
0. Make probability calculations to judge whether or not these 
results support the idea of mental telepathy. 

10. It is well known that among rectangles with given perimeters, the 
square has the largest area. Use this result to argue that the binomial 
yanance is largest when p = f Why is it intuitively reasonable that 
the greatest variability in a binomial variable occurs when p = ^? 
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11. Graph the binomial variance versus p for n = 10. Note that the 
maximum variance of n/4 occurs at p = ^. 

12. Given a family of four children, list all of the possible sequences by 
sex. What is the most probable number of girls? Given that there are 
two boys and two girls, what is the probability of two boys followed 
by two girls? 

13. An examination includes 10 multiple-choice questions with three 
possible choices for each answer. If there is only one correct answer 
for each question, what is the probability of correctly answering at 
least 90% of the 10 questions by guessing? 

14. A gardener plants six hills of squash, six seeds to a hill. If the 
germination rate of the seed is .95, what is the probability of at least 
one squash plant in all six hills? 

15. A mail-order house ships unassembled bicycles. The shipping 
inspector makes one last inspection to make sure that all parts have 
been included. If the probability of a complete shipment is .99, what 
is the probability that all of the 1000 bicycles shipped in 1 month 
were shipped complete? 

16. One thousand components are connected in series. What reliability 
is required of each of the components if the reliability required for 
the system is .99? 

17. A woman zealously enters every possible sweepstakes contest. If the 
chance of winning any given sweepstakes is no greater than 1 in 1 
million, what is the probability of winning at least one sweepstakes if 
she enters 1000 times? 

18. With n = 10 and p = .5, find: 

a. P{p — 3a < X < fi + 3a). 

b. Pip — 2a < X < p + 2a). 

c. Pip — a < X < p + a). 

19. Verify that the mean and variance of XIn is p and p{l - p)/n. 

20. With n = 30, find the probability that X/n differs from p by no more 

than 2^p(l — p)/s/n for p = .1, . 2 ,...,. 9. 

21. A random sample of students is taken to study student attitudes 
toward a change in dormitory rules. It is desired that the estimate 
favoring the change, X/n, differ from the true fraction, p, by no more 
than .05. How large must the sample size be in order to give this 
margin of error with .95 probability? 









GRESHAM COLLEGE. The building where Karl Pearson first lec¬ 
tured on probability and statistics. Reproduced by permission from 
the Guildhall Library, City of London. 
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THE NORMAL 
DISTRIBUTION 


11.1 The normal distribution, which we encountered briefly in the lectures on 
political arithmetic, the casino, and frequency distributions, is the most 
used probability distribution. It is fitting that there be considerable 
interest in tracing its origins. Because the distribution is often called the 
Gaussian distribution, it might be supposed that it is due to Gauss (1777- 
1855). However, the distribution predates the birthdate of Gauss. In 
giving a date for the first publication of the normal distribution. Walker 
[7, p. 4] is quite emphatic about November 12, 1733. 


11.2 In the last half of the seventeenth century Bernoulli (1654-1705) began 
work on his Ars Conjectandi {The Art of Conjecture). This monumental 
four-part work was published in 1713,8 years after Bernoulli’s death. In 
Part 4, Bernoulli was concerned with how certainly one can determine a 
probability of heads for a biased coin by many tosses. Bernoulli showed 
that the probability can be determined as precisely as desired if enough 
trials are made. This was an elementary form of the law of large numbers; 

that is, x/n approaches p (in a way that can be made precise) as n 
approaches infinity. 


11.3 While working on the problem posed by Bernoulli, De Moivre (1667- 
1754) discovered an approximate formula for binomial probabilities. 
This formula seems to be the first appearance of the normal distribution. 
According to Walker, the formula was probably discovered in 1721 but 
was first committed to writing by De Moivre in a seven-page Latin 
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treatise dated Novem ber 12 . 1733 entitled “Approximatio ad Summam 
Terminorum Binomii a + b" in Seriem expansi.” It was published with 
the second and third editions (1738, 1756) of The Doctrine of Chances 
and with certain copies of Miscellanea Analytica (1730). Pearson [6] 
discovered a copy of the pamphlet in the library of University College 
London, and called attention to De Moivre as the apparent father of the 
normal distribution. Adams [1] recognizes that De Moivre obtained the 
normal distribution but doubts that De Moivre recognized the normal 
distribution as a distribution in its own right. 


11.4 The statistical use of the normal distribution began with the mathema¬ 
tical astronomers Laplace (1749-1827) and Gauss (1777-1855). The first 
tables giving probabilities (areas under the curve) were published in 1799 
by the French physicist Kramp; it was first used for description of data 
outside the physical sciences by the Belgian statistician Quetelet (1796- 
1874). The distribution was first called normal by Pearson in 1893 and is 
best known by this name today. 


11.5 What was this distribution that so captured the imagination of scientists 
in the late eighteenth and nineteenth centuries? Galton [3] was so 
impressed by it that he wrote: “I know of scarce anything so apt to 
impress the imagination as the wonderful form of cosmic order expressed 

by the ‘Law of Frequency of Error.’” 

The normal distribution is simply a bell-shaped curve (or a collection 
of bell-shaped curves) that does a remarkable job of fitting the 
histograms of data from widely divergent sources. Actually, the typical 
normal curve is sharper at the top than most bells, but it is commonly 
called bell-shaped, and we will persist in this description. The equation of 
the curve is given by 





where x ranges between — oo and -t- oo and /i represents the mean and <s , 
the variance. The effect of varying n and a is displayed in Figure 11.1. As 
the mean increases, the entire curve moves to the right. As the variance 
increases, the curve becomes flatter, with a larger spread. 
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fix) 



11.6 A recent television news item concerned a 10-year-old boy who had 
scored 166 on an IQ test and the efforts of his parents to find special 
education for him. What is the meaning of such a high IQ score? IQ 
scores are generally thought to be normally distributed, although the 
expression IQ is better known to the general public than the normal 
distribution. Cohen [2] traces the development of mental testing from 
the ideas of Galton in the 1890s to lawsuits concerning IQ in the 1970s. 
Another excellent description is given in the Life Science Library volume 
called The Mind [8], The term intelligence quotient, abbreviated IQ, was 
coined by an American psychologist named Lewis Terman. Originally 
the idea was to try to obtain the ratio of mental age to chronological age; 

hence the word quotient. Although the IQ score is no longer determined 
in that way, the term has lasted. 

There are actually many intelligence and aptitude tests in use; scores 
from most of these are considered to be normally distributed with 
specified means and variances. Lyman [5] gives a discussion and a 
conversion table for a number of test scores, including the AGCT (Army 
General Classification Test) score, the CEEB (College Entrance 
Examination Board) score, the Stanford-Binet IQ score, and the 
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Wechsler IQ score. All of these are normally distributed, with means and 
standard deviations as follows. 


Score 

Mean 

Standard 

Deviation 

AGCT 

100 

20 

CEEB 

500 

100 

Stanford-Binet IQ 

100 

16 

Wechsler IQ 

100 

15 


The 10-year-old boy scoring 166 on the Wechsler IQ test has a score 
more than four standard deviations above the mean and is in a select 
group of individuals. We will now discuss the use of the normal curve to 
calculate probabilities, and we can then say just how select a score of 166 
really is. 


11.7 Probabilities are represented by areas under the curve. Thus the 
probability that X lies between 4 and 6, P(4 ^ X <6\ is given by the 
area under the curve between 4 and 6. The area under the curve at any 
given point is zero and, therefore, the probability that X equals any given 
value is zero. This fact, which strikes many students as puzzling, is easily 
explained when we remind ourselves that the normal distribution gives a 
curve that approximately describes a large population. In a large 
population resembling the normal curve, the chance of drawing any 
specific value would be small, if not exactly zero. 

Many tabulations have been made of the area under the normal curve. 
In Table A.3, we present tables due to Pearson and Hartley. These tables 
give P{X < x) when n = 0 and <7=1. The case where = 0 and a = 1 is 
called the standard normal distribution, and it is customary to denote a 
standard normal variable by Z. We will adhere to this custom in 

subsequent sections. 

With the use of this table and a little effort, one can find any desired 
area under the curve. It may be helpful to make sketches showing the 
area known and the area desired. Then, keeping in mind that the curve is 
symmetric about zero, we can find the desired area much in the manner 
of solving the “pie problems” of elementary school. 
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Figure 11.2 Using Normal Tables. 



Example. What is P(-1.5 < Z < 1)? Looking at the sketch in Figure 
11 .2, we can see that 

P(-1.5<Z< 1) = P(Z< 1)-P(Z< -1.5) 

= P(Z <!)-[!- P(Z < 1.5)] 

= .8413 -.0668 
= .7745 


11.8 The real utility of the standard normal table lies in the fact that the same 
table can be used to find the area under any normal curve, regardless of /r 
and (7. The secret lies in first standardizing the variable. This is 
accomplished by subtracting the mean and then dividing by the standard 
deviation. In symbols, this means that (X — fi)fa is a standard normal 
variable and can be denoted by Z. If we have mastered the use of the 
standard normal table, we can then find areas under the normal curve for 
X by standardizing X. Suppose we want 

Pia^X^ b) 

Subtracting n from each term and dividing by a, the desired probability 
becomes 





a 








Example. If scores on a standardized test are normally distributed with 
mean 100 and standard deviation 15, what fraction of scores will lie 
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between 85 and 130? The answer is given by 

P(S5 < A' < 130) 

- p\ ~ < Z < ~ 


15 

= P( - 1 < Z < 2) 
= .9772 - .1587 


15 


= .8185 

A graphical approach to this solution is given by the sketch in Figure 
11.3. 




Figure 11.3 Fraction of Standardized Scores. 


It is essential that students learn to use the standard normal table. We 
suggest that the student use sketches such as those in Figures 11.2 and 
11.3. 

Returning to the case of the 10-year-oId with an IQ of 166, we might 
ask what fraction of the population has a higher IQ. That is, we desire 

P{X > 166) 

Standardizing the score by subtracting p = 100 and dividing by rr = 15, 
we obtain 


P{Z > 4.4) 

From Table A.3 we find this probability to be less than .00003. Thus 
there are very few individuals with higher IQs. 


11.9 The normal distribution provides an adequate description, not only of 
IQ scores, but also of data from a wide range of origins. Many sets of data 
obtained by measurement processes are normally distributed. 
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SUMMARY. The normal distribution was popularized by Gauss in the 
early nineteenth century and was generally known as the Gaussian 
distribution. Pearson first called it the normal distribution and pointed to 
De Moivre as the originator. Still, it is often called the Gaussian 
distribution. 

This distribution of a continuous variable is a symmetric, somewhat 
bell-shaped curve with equation 



^-(x- 


The population constants, or parameters, and completely specify 
the distribution. 

The normal distribution has been extensively tabled for the standard 
normal case where the mean is zero and the variance is unity. This allows 
calculation of probabilities for any normal distribution through stan¬ 
dardizing the variable. Standardizing consists of first subtracting the 
mean and then dividing by the standard deviation. 

IQ scores and scores on many standardized tests are generally 
regarded as having a normal distribution. In addition, grading on the 
curve assumes a normal distribution of test scores. 
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EXERCISES 

1. If five observations are obtained at random from a normal 
distribution, what is the probability that at least two of the 
observations will be greater than /i? 

2 . Using Table A.3, find: 

a. P(Z > 1.5). 

b. P{Z > 1.64). 

c. P(Z > 1.645). 

d. P(-1.96<Z< 1.96). 

e. P(-1.645 < Z < 1.645). 

3. If /.i = 100 and a = 20, find P{X > 120). 

4. If /i = 500 and tr = 100, find P(350 < A' < 800). 

5. An instructor wishes to “grade on the curve.” The students’ scores 
for a multiple-section course seem to be normally distributed with 
mean 70 and a standard deviation of 8. If the instructor wishes to 
give 10% As, what should be the dividing tine between an A grade 
and a B grade? 

6 . Although the table does not include such large values, about what is 
the probability that Z lies between 6 and 9? 

7. If the scores on an intelligence test were normally distributed with 
mean 100 and a standard deviation of 16, how high would a person 
have to score to receive a genius rating if we give such ratings to only 
0.5% of the people taking the exam? 

8 . Using Table A.3, graph z on the horizontal axis versus the 
cumulative probability on the vertical axis. 

9. If a student’s standardized CEEB score was 1.2, what was the raw 
score? 

10. If// = 120and(T = 15 , find a constant c so that E(-c < X < c) = .5. 

11. If// = - 150 and a = 15, find a constant c so that P{-c < X < c) 

= .75. 

12. The percent of alcohol in a nasal decongestant syrup is advertised to 
be 1.4%. The manufacturing process is such that the percent alcohol 
is normal with mean 1.4% and standard deviation 0.1%. Find the 
probability that the percent alcohol in one batch exceeds 1.5%. 

13. A breakfast cereal is packaged and sold in 14-ounce packages. 
Detailed analyses by the shipping inspector reveals that the actual 
weight is a normal random variable with standard deviation 0.2 
ounces. If the fraction of packages with weight less than or equal to 
14.5 ounces is .9, what is the mean weight of 14-ounce packages? 
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14. A person is told that he is overweight because for his height he 
should weigh between 132 and 145 pounds. If weight is a normal 
variable and 95% of people his height have weights within the 
prescribed range, about what is the average weight? 

15. An electrical system has 10 components connected in series. Each 
component will function properly as long as the voltage of one of its 
components lies between the engineering specifications of 100 and 
130 volts. The voltage of each device is a normal variable with a 
mean of 115 volts and a standard deviation of 5 volts. What is the 
reliability for the system? 

16. The number of monthly orders received by a wholesale house for a 
particular item seems to be a binomial random variable with 
n = 100,000 and p = .4. The wholesale price for the item is $15. 
Calculate the mean and variance for the total monthly sales. 
Approximate the probability that the total monthly sales exceed 
$650,000 by treating the total monthly sales as a normal variable. 

17. Plot the binomial probabilities for n= 10, p = .50. Then represent 
the binomial distribution by a histogram using intervals -0.5 to 0.5, 
0.5 to 1.5, etc. 

18. Approximate the binomial probability that X = 4 for Exercise 17 by 
the area under a normal curve from 3.5 to 4.5. Use a mean of 5(np) 
and a variance of 2.5[np{l - p)]. 




SIMEON DENIS POISSON. 1781-1840. Courtesy Columbia 
University Library, D. E. Smith Collection. 





POISSON’S 

EXPONENTIAL 

LIMIT 


12.1 Poisson (1781-1840), examiner and professor at the Ecole Polytechnique 
of Paris for nearly 40 years, wrote over 300 papers about mathematics, 
physics, and astronomy. Although his most famous work was Traite de 
Mecanique (1811), he is best known in probability and statistics for the 
Poisson distribution, which was given as a limit of the binomial 
distribution in 1837. Poisson’s exponential limit of the binomial [6] 
seems to have been predated by the same result obtained by De Moivre. 
David [2] points out that De Moivre first gave the binomial exponential 
limit in the first edition of The Doctrine of Chances in 1718. Nevertheless, 
the distribution continues to be known as the Poisson distribution. 

12.2 With a rudimentary knowledge of combinations, limits, and the definition 
of e we can obtain the same result as De Moivre and Poisson. In order to 
appreciate the Poisson as well as the normal distribution, it is necessary 
to have a basic understanding of the important mathematical constant, 
e. The definition generally given for e is 



Fortunately, it is very easy to observe the limiting process called for in 
this definition. Let us evaluate a few terms in the limiting sequence. 


(1 + \y = 2.00000 

(1+^)2 = 2.25000 
(1 + = 2.37037 

(I + i)'" = 2.44141 
(1+^)5 = 2.48832 
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By continuing this process, students can evaluate e to any desired 
precision. To five decimal places, c has the value 2.71828. This evaluation 
is particularly easy on most calculators. 

We now remind students of another result that we need to obtain the 
Poisson distribution as a limiting case of the binomial. 



This result can also be verified numerically with a calculator. 


12.3 With an understanding of e and e*" we can proceed to obtain the limiting 
form of the binomial distribution as n becomes large and p becomes 
small in such a way that the product np is constant. First, write the 
binomial probability function as 



n\ 

x\{n - x)\ 




X 


Next, observing that 

n\ = n{n — l)(n — 2)...{n — x + l)(n — x)...(2)(l) 
= n(n — l)(n — 2)...(n — X + l)(n — x)! 


we have that 



n(n - l)(/i - 2)...(n - x + 1) 

-i- pV 

x! 



X 


Next replace p by 2/n (because np = 2) to obtain 



n(n — l)(/i — 2)...(n — X + 1) 

x! 




n(n- l)(/i-2)...(n-x-t- 0 - -V'' 

n(/j)(n)...(n) x!\ nj 


Now consider what happens when n becomes indefinitely large. The first 
terms approach unity, and we are left with 


2 * 

lim fix) = — lim 
x! 


n-* 00 
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Now, as n approaches infinity, so does n — x, so 


lim 

n-* 00 



lim 

n-oo V nj 


Finally, we obtain the famous result that the limit of the binomial is given 
by the exponential expression 


A* e~^ 

f(x) = ——, x = 0,1,2,... 

x! 

This distribution is known as the Poisson distribution. Naturally, since n 
was allowed to become infinite, the possible values of x extend from zero 
to infinity. 


12.4 The moments of the Poisson are particularly easy to obtain and are well 
known. The remarkable fact is that both the mean and the variance are 
equal to the constant (or parameter) A. To be explicit. 


// = A 



This result can be given intuitive justification as a limiting property of 
the binomial mean and variance. With n getting large in such a way that 
np is constant, p gets small as n becomes large. Then 


lim np — X 

n-*oo 

and 

lim npq = lim A(1 — p) 

p-*Q 

= A 


12.5 


The Poisson distribution has been extensively tabled, and Table A.4 

gjves fix) for x from 0 to 24. For example, if A = 3.8, /(3) = 0 2046 If 
A = 9.0,/(3) = 0.0150. 
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12.6 One of the classic, if today somewhat quaint, examples of using the 
Poisson distribution was given in 1898 by Bortkiewicz [1]. He tabulated 
the number of cavalrymen who died from a kick from a horse. Some of 
the Bortkiewicz data were presented earlier in Lecture 5 (see Fisz [3]). 

Taking 10 cavalry corps over 20 years gives a total of 200 annual 
records. Consider the number of horsekick deaths (r) reported in each of 
these 200 records and the frequency for each value of r reported. 
Bortkiewicz’s results are given in Table 12.1. 

In 109 of the 200 records, no horsekick deaths were reported. In 65 of 
the 200 records, one horsekick death was reported, etc. 

From the data the sample mean is obtained as 

(0)(109) + (1)(65) + (2)(22) + (3)(3) + (4)(1) ^ 


The mean obtained from the data is then used as the value for 2 in the 
probability distribution. This gives us the equation 



- 0.61 


(0.61V 



which was introduced in Lecture 5 and which can be used to calculate the 
theoretical probabilities. The values can also be approximated by using 
Table A.4 with 2 = 0.6. 

The Bortkiewicz data is well described by the Poisson distribution. 
Our interest in this particular example is now purely historical. 
However, it is easy to conceive of the Poisson distribution arising in this 
example as a limiting approximation to a binomial. A cavalryman will 
either be killed by horsekick during a year or not. It is reasonable to 
suppose that the chance of this rare event is the same for all soldiers and 
that soldiers have independent chances of being killed. Thus the number 


Table 12.1 Frequency Distribution of Number of 
Cavalrymen Killed by Horsekick 




Relative 

Theoretical 

r 

Frequency 

Frequency 

Probability 

0 

109 

0.545 

.544 

1 

65 

0.325 

.331 

2 

22 

0.110 

.101 

3 

3 

0.015 

.021 

4 

1 

0.005 

.003 
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of cavalrymen killed during a year is a binomial variable. However, the 
probability of being killed, p, is very small and the number of trials, the 
number of cavalrymen, is very large. Therefore the Poisson limit gives a 
very good description of these data. 


12.7 Another example of using the Poisson distribution for the description of 
real data was published in 1907 by “Student” [7]. He also gives a 
derivation of the exponential limit to the binomial without reference to 
either Poisson or De Moivre. “Student” gives the frequencies of yeast 
cells counted in a thin layer of liquid spread out over a slide. The 
frequency counts for 400 small areas are given by Table 12.2. 

As with the horsekick data, we can obtain the sample mean .v = 
0.6825. Using this value for /, we can obtain the theoretical probabilities 
and see that the Poisson distribution does a good job of describing the 
data. We can also approximate these probabilities by using Table A.4 
with = 0.7. Students are encouraged to read this paper to see how the 
Poisson distribution arises as a limiting form of the binomial. 


12.8 Mullet [5] used the Poisson distribution to describe the number of goals 
scored for and against each of the teams in the National Hockey League 
in the 1973-1974 season. For example, Boston scored 4.95 goals on the 
average in its home games. The frequency distribution of number of 
goals per game is shown in Table 12.3 along with theoretical prob¬ 
abilities calculated by using / = 4.95. These probabilities can be closely 


Table 12.2 "Student s" Yeast Cell Data 


r 

Frequency 

Relative 

Frequency 

Theoretical 

Probability 

0 

213 

0.5325 

.505 

1 

128 

0.32 

.345 

2 

37 

0.0925 

.1175 

3 

18 

0.045 

.0275 

4 

3 

0.0075 

.0046 

5 

1 

0.0025 

.0006 


Source. Reproduced from Biometrika, 5 "Student," On the error 
of counting with a haemacytometer. Copyright © 1906 by the 
Biometrika Trustees. Reprinted by permission of the Biometrika 
Trustees. 
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Table 12.3 Boston's Home Goals: 1973-1974 


r 

Frequency 

Relative 

Frequency 

Theoretical 

Probability 

0-1 

1 

0.0256 

.0421 

2 

2 

0.0513 

.0868 

3 

5 

0.1282 

.1432 

4 

9 

0.2308 

.1772 

5 

10 

0.2564 

.1754 

6 

5 

0.1282 

.1447 

7 

2 

0.0513 

.1023 

8 

3 

0.0769 

.0633 

9 

1 

0.0256 

.0348 

>10 

1 

0.0256 

.0302 


Source. Reproduced from The American Statistician, 31. Gary 
M. Mullet. Simeon Poisson, and the National Hockey League. 
Copyright © 1977 by The American Statistician. Reprinted by 
permission of the American Statistical Association. 


approximated by using Table A.4 with / = 5.0, Mullet suggests that 
because each goal is like a random observation from a Poisson 
distribution, the game of hockey is quite predictable. 


12.9 In summary, the Poisson distribution can be used as an approximation 
for the binomial (large n and small p) and as a distribution in its own right 
for counting data of certain types, such as number of radioactive 
particles emitted in a given time period, number of telephone calls 
received in a given time period, number of equipment failures in a given 
time period, and even number of cavalrymen killed by horsekick in a 
year. 


SUMMARY. The Poisson distribution, like the normal distribution, 
was first obtained by De Moivre but generally credited to someone else, 
in this case Poisson. 

With an understanding of the constant e as the limit of(l + 1/n)" as n 
becomes large, it is possible to obtain the limit of the binomial 
distribution. As n goes to infinity and p goes to zero so that np = 
constant, the limit of the binomial distribution is 

f(x) = e~^k'‘lx\ 
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Both the mean and variance of the Poisson distribution are found to 
equal /. This is a somewhat surprising result, but it can also be verified by 
taking the limit of the binomial mean and variance as n goes to infinity 
with np = X. 

The Poisson distribution is also extensively tabled and is used to 
describe the probability of rare events such as horsekick deaths, lightning 
flashes, and equipment failures. 
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EXERCISES . 

1. Using a calculator, evaluate [1 + (1/n)]" for n = 6,l, 8,9, 10. Does 
the sequence seem to be converging toward the stated value? 

2. Using a calculator with a key for e, evaluate e^. Next calculate 

[1 + (3/n)]" for n = 1, 2, 3, ..., 10. Does the sequence seem to be 
converging toward the value for e^l 

3. For a binomial distribution with n = 50 and p = . 1 , calculate the 
probabilities for x = 0,5,10,15,20. Using the Poisson formula with 
X - np - 5, calculate the probability that x = 0, 5, 10, 15, 20 
Compare the Poisson probabilities with the binomial probabilities 

4. Using Table A.4, calculate the probability of at least five occurrences 
tor X = 1, 2, 3,4, 5. 
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5. Using the value of 0.61 for A, calculate the “theoretical probabilities” 
for the Bortkiewicz data. Graph the observed relative frequencies 
and theoretical probabilities on the same graph paper. 

6. Using the value of0.6825 for /, calculate the theoretical probabilities 
for Student s yeast cell data. Graph the observed relative frequen¬ 
cies and theoretical probabilities on the same graph paper. 

7. Find a set of data that should be well described by the Poisson 
distribution (duration of strikes, numbers of traffic accidents, 
bacterial counts, numbers of severe storms, etc.). Using the average 
of the data as the value for A, calculate the theoretical frequencies 
and compare with the observed. 

8. Using the value of 4.95 for A calculate the theoretical probabilities 
for the hockey goal data of Table 12.3. Graph the observed relative 
frequencies and theoretical probabilities on the same graph. 

9. Graph the Poisson probability function for A = 1,2,.,., 10. 

10. Represent the Poisson distribution with A = 10 by a histogram with 
classes —0.5 to 0.5,0.5 to 1.5,1.5 to 2.5, etc. Approximate P{X = 8) 
(the area of the bar centered on 8) by the area under a normal curve 
with mean 10(A) and variance 1(XA). 

11. Use the Poisson distribution to calculate the binomial probability of 
Lecture 10, Exercise 15, with A = np. 

12. Use the Poisson distribution to calculate the binomial probability of 
Lecture 10, Exercise 17. 

13. A municipal power plant has experienced occasional power outages 
for some parts of town. A new industry is considering locating a new 
facility in the town but is concerned about the possibility of a power 
shortage. The number of power outages per year seems to be a 
Poisson with A = 6. What is the probability of no more than two 
power failures per year? 

14. The manager of a large pecan orchard uses a spray program to 
control the pecan weevil population. The number of pecan weevils 
per tree is a Poisson variable with parameter A. He wishes to control 
the weevil population so that the probability is .95 that there are no 
more than 50 weevils per tree. What should be the mean number of 
weevils per tree? 

15. The physical plant manager for a university has become worried 
about the saturation of university telephone lines and is considering 
installing some lines for emergency calls only. If the number of calls 
reaches 20,000 calls per hour, the telephone system for offices is 
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saturated. What is the greatest that / can be so that the probability is 
.95 that the number of calls does not exceed 20,000? Use the normal 
approximation to determine an answer. 

16. Equipment problems with a fleet of airplanes necessitates a new 
maintenance schedule. Any airplane with more than three minor 
difliculties in 250 hours of operation receives a complete mainten¬ 
ance check. The number of equipment problems per 1000 hours of 
operation is a Poisson variable with parameter = 10. What 
fraction of planes in the fleet will be expected to have complete 
maintenance checks after 250 hours of operation? 

17. A motorist has received her third traffic ticket in a month for running 
a red light at an intersection on the way to work and has become 
convinced that the traffic light is not functioning properly. She 
conducts her own study of the traffic light and determines that the 
number of cars it allows through on green is a Poisson variable with 
parameter A = 3. What is the probability of a traffic violation for the 
first car waiting? For the second car? For the third car? 
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SURVEY SAMPLING. Always a chance for an error. UPI. 
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13.1 For all sad words of tongue or pen 

The saddest are these: "It might have been!” 

John Greenleaf Whittier 

In Lecture I numerous references were made to early censuses and 
census taking. A modern development in census taking is the deliberate 
use of sampling. In the 1970 U.S. census of population and housing most 
of the information was collected by sampling (see Procedural History, 
1970 Census of Population and Housing). The complete census was used 
for the population items of age, sex, race, relationship to head of 
household, and marital status. For most of the other information on 
population, the basic sample was a 20% sample; a smaller sample was 
taken on some items. 

We are exposed almost daily to the results of public opinion polls on 
every item under the sun. Such polls have wide acceptance by the general 
public, although a healthy scepticism remains—perhaps because of 
disasters such as the polls that predicted Landon’s victory over 
Roosevelt in 1936 and Dewey’s victory over Truman in 1948. Despite 
general acceptance, there is a mystique associated with drawing valid 
conclusions from a sample. We wish to try to remove some of the 
mystery; to do so, we will consider in detail a small numerical example. 


13.2 Consider the miniature city consisting of seven households, as described 
in Table 13.1. From this table we can easily calculate the average 
annual household income, /<, to be S29,0(X). 


127 
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Table 13.1 Miniature City 


Households 

Annual Income 
(thousands of dollars) 

North of river 

1 

30 

2 

35 

3 

78 

South of river 

4 

15 

5 

13 

6 

12 

7 

20 


Suppose that Table 13.1 were not available to us and we decided to 
find out about annual income by taking a simple random sample of size 5. 
If the sample consisted of households 1,2,4,6, and 7, the sample average, 
X, would be $22,400. In reporting this value we would realize that other 
values of X could have occurred, and we would be led to describing the 
uncertainty in the single value of x we obtained in terms of the values of x 
that might have been. 

The careful consideration of what might have been is an important 
part of statistical thinking. We must consider the meaning of a statistic 
calculated from a single sample from a given population. We consider 
the conceptual population of statistic values that could have been 
generated if all possible samples had been drawn. For this conceptual 
population there is a probability distribution called a sampling distri¬ 
bution, and this distribution is then used to interpret the meaning of a 
statistic value from a single sample. 


1 3.3 With the population of Table 13.1, what are the values of x that could 
have been generated by simple random sampling? With this small 
population we can actually list all possible samples, with the resulting x 

as in Table 13.2. 

From Table 13.2 we can calculate the mean and variance of x to be 

/ij = 29 

al = I(x - 29)V21 
= 31.05 


and 
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Table 13.2 Simple Random Sampling 


Sample Households x Probability 


1 

2 

3 

4 

5 

34.2 

_L_ 

21 

1 

2 

3 

4 

6 

34.0 

J_ 

21 

1 

2 

3 

5 

6 

33.6 

J_ 

21 

1 

2 

4 

5 

6 

21.0 

J_ 

21 

1 

3 

4 

5 

6 

29.6 

J- 

21 

2 

3 

4 

5 

6 

30.6 

21 

1 

2 

3 

4 

7 

35.6 

21 

1 

2 

3 

5 

7 

35.2 

21 

1 

2 

4 

5 

7 

22.6 

-L 

21 

1 

3 

4 

5 

7 

31.2 

J_ 

21 

2 

3 

4 

5 

7 

32.2 

_L- 

21 

1 

2 

3 

6 

7 

35.0 

J_ 

21 

1 

2 

4 

6 

7 

22.4 

J_ 

21 

1 

3 

4 

6 

7 

31.0 

a, 

21 

2 

3 

4 

6 

7 

32.0 

J_ 

21 

1 

2 

5 

6 

7 

22.0 

21 

1 

3 

5 

6 

7 

30.6 

J- 

21 

2 

3 

5 

6 

7 

31.6 

21 

1 

4 

5 

6 

7 

18.0 

21 

2 

4 

5 

6 

7 

19.0 

1_ 

21 

3 

4 

5 

6 

7 

27.6 

J_ 

21 


Table 13.2 provides a fundamental way of thinking about all possible 
samples, but it is not necessary to construct the table in order to get the 
variance of x. In the theory of survey sampling this variance is derived to 
be 


, N-n 

ai --— 

N n 

where 

N = the population size 
n = the sample size 

= the population total mean square Z(x - n)^/{N — 1) 


From the data in Table 13.1 we find that 

= 543.33 
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so that 

N - n _2 
^ N 

= 31.05 


Let us summarize what we have learned. If we take a simple random 
sample of size n and calculate the sample average, x, we can make the 
following statements. 


1. The average of all possible xs that could have been obtained is the 
mean of the original population, fi. In terms of expected value, we say 
E{x) = // and call .v unbiased. 

L The variance of all possible is given by the formula 

N-nS^ 


ol = 


N 


n 


For n small relative to N, the variance of x Is closely approximated by 


. 1 =- 

n 


1 3.4 The variance formula involves unknown parameters and, therefore, 
must be estimated. An unbiased estimate of the variance is given by 

N — n 
N 'n 


where .s’= I(.y, - .v)^/(//- 1), the sample total mean square. To 
illustrate, suppose we had drawn the sample {1, 3, 4, 5, 6}. Then 
.s^ = 785.3 so 



7 - 5 785.3 
7~ 5 


44.87 


and .V, = 6.70. 

If the population mean income were unknown to us we would then 
report our sample mean of x = 29.6 with an estimated standard 

deviation of 6.70. 


13.5 Quite frequently the population is blocked (or stratified) into two or 
more blocks (strata) and a random sample is taken from each stratum. 
For our miniature city, suppose we stratify our city into two strata: 
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13.6 


Table 13.3 Stratified Random Sampling 


Households 

Vv 

Households 

'fs 


1 

2 

32.5 

4 

5 

6 

13.33 

21.55 

1 

2 

32.5 

4 

5 

7 

16.00 

23.07 

1 

2 

32.5 

4 

6 

7 

15.67 

22.88 

1 

2 

32.5 

5 

6 

7 

15.00 

22.50 

1 

3 

54.0 

4 

5 

6 

13.33 

30.76 

1 

3 

54.0 

4 

5 

7 

16.00 

32.29 

1 

3 

54.0 

4 

6 

7 

15.67 

32.09 

1 

3 

54.0 

5 

6 

7 

15.00 

31.71 

2 

3 

56.5 

4 

5 

6 

13.33 

31.83 

2 

3 

56.5 

4 

5 

7 

16.00 

33.36 

2 

3 

56.5 

4 

6 

7 

15.67 

33.17 

2 

3 

56.5 

5 

6 

7 

15.00 

32.79 


households north of the river and households south of the river. Then we 
take a random sample of two households north of the river and a 
random sample of three households south of the river. We can perform 
the calculations of the previous section for both sides of the river. This 
gives the two sample averages Xjy and Xs, which we combine by 
calculating .x,, = (3xjv + 4 x 5 )/?. These calculations are summarized in 
Table 13.3. The average of the possible values of the stratified estimate 
is 29.00 as with .y, so .Yj, is also an unbiased estimate of //. A com¬ 
parison of the values of .y„ in Table 13.3 with the values of.Y in Table 13.2 


reveals that the stratified estimate is much less variable. This is borne 

out by the fact that the variance of.Y,, is 21.66, whereas the variance of 
X was 31.05. 


In actually conducting a survey of a city we must devise some way of 
eliciting the information we desire. In fact, we probably will not have a 
complete list of households and will have to devise some other approach. 
If we are using door-to-door interviewers we might use a map of the city. 
In such cases we will have to make some compromises with random 
sampling because all dwelling units will not be shown on the map. For 
example, we may choose blocks at random and take every third 
residence within blocks. If we conduct a telephone survey, we will have to 
be satisfied with existing telephone directories, realizing that some 
people are not listed. We may use a mail-out questionnaire if we have a 
reasonable list of residences (e.g., the telephone directory) 
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Regardless of the method we use, there will be major problems not 
encountered with our miniature city example. Sometimes the non¬ 
response rate will be very high. If we have planned a fairly small sample to 
begin with, nonresponses may give us such a small sample that our work 
is not informative. Other problems can arise if our instrument or 
questionnaire has not been pretested to insure that it makes sense to the 
people being surveyed. 

1 3.7 We have already considered what happens when the population size. A/, 
is large relative to the sample size, n. This usually results in simpler 
formulas, and we are therefore interested in random sampling from so- 
called “infinite” populations. Consider a very large population with one- 
third Is, one-third 2s, and one-third 3s, so that /i = 2 and = f. 
Consider a sample of size n = 2 and consider the sampling distribution 
of .Y. The possible samples with values of .v are given in Table 13.4. 
From Table 13.4 we can obtain easily the sampling distribution of .y as 
given in Table 13.5. 


Table 13.4 Sampling an Infinite Population 


Sample 

V, 

^2 

X 

Probability 

1 

1 

1 

1 

1 

9 

2 

1 

2 

a 

2 

1 

9 

3 

2 

1 

3 

2 

1 

9 

4 

1 

3 

2 

1 

9 

5 

3 

1 

2 

1 

9 

6 

2 

2 

2 

1 

9 

7 

2 

3 

» 

2 

1 

9 

8 

3 

2 

5 

2 

1 

9 

9 

3 

3 

3 

1 

9 


Table 13.5 

Sampling Distribution of x 



133 SAMPLING DISTRIBUTIONS AND SURVEY SAMPLING 


From this table we can calculate the mean and variance of x and find that 

/^5= 2 = /i 
= 3 = 

13.8 Many ideas concerning sampling distributions were published as early as 
1832 (Encke [4]) and many results were given in the 1860s (Airy [1]). 
However, the paper by Pearson [7] is often regarded as the beginning of 
modern sampling distribution theory. The development of the subject 
received a great impetus with publication of “Student’s” paper in 1908. 
Beginning with his paper [5] in 1915, Fisher obtained and published the 
sampling distributions of many statistics being used for the analysis of 
data. Following the work of Fisher, countless researchers have obtained 
the sampling distribution of virtually every conceivable statistic. 


SUMMARY. In recent years the U.S. Census Bureau has made 
extensive use of sampling to obtain some of the census information. 
When a simple random sample is taken from a population, the sample 
mean, x, is used to estimate the population mean. When a stratified 
random sample is taken, a weighted combination of the strata sample 
means, x,,, is used. To compare the two estimates requires consideration 
of the different values that could be obtained from all possible samples. 
This leads to what is called a sampling distribution. 

The sampling distributions of both x and x^, have the same mean as 
the original population. Therefore both x and x„ are unbiased estimates 

of n. However, if the strata are chosen well, the variance of x„ will be 
smaller than the variance of x. ** 

The variance of the sample mean from a simple random sample is 
given by ^ 



N -n 
N - 1 



n 


If the population is quite large, infinite for practical 
variance of x is given by a^jn. 


purposes, the 
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EXERCISES 


1. Verify the value for the variance of the sample mean from Table 13.2. 
Show that this same value can be obtained from the formula 





n 


2. Suppose that in Section 13.2 we had drawn the sample {1,2,4,6,7). 
Use this sample to estimate p. Estimate the variance of your estimate 

of p. 

3. Suppose that we use stratified random sampling, resulting in the 
choice of {1, 3} north of the river and {4, 6, 7} south of the nver. 
Estimate the average annual household income. 



^ratified random sample was conducted to estimate the average 
ekly food costs of married couples in a small city. The couples 
re stratified into four strata: (a) couples aged 20 to 40 without 
Idren, Ih) couples aged 40 and up without children, (c) couples 
ih one or two children, and {d) couples with more than two 
Idren. The stratum sizes, sample sizes, sample means, and sample 

fiances were as follows. 
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Stratum 

N 

n 

X 

•> 

1 

1000 

10 

44 

64 

2 

2000 

15 

39 

76 

3 

3000 

25 

58 

46 

4 

4000 

15 

74 

48 


Estimate the average weekly food cost. 

5. Consider a very large population with the population distribution 
given by: 


X 

P{x) 

1 

1 

6 

2 

1 

2 

3 

1 

3 


a. Calculate the mean and variance, a’. 

b. For a sample of size 2, obtain the sampling distributions for x, s ’, 
and = max(.v,,X2). 

c. For each of the distributions in part b, find the mean and 
variance. 


d. Verify that /ij = ^^al = a^ln, and 


6 . The owner of a chain of supermarkets is considering building 
another store in a neighboring town and hires an opinion research 
firm to conduct a survey of shoppers’ attitudes. A random sample of 
50 households is selected from the city population of 5000 house¬ 
holds. One of the many questions asks the amount of the monthly 

fn !; reported for the sample 

IS $10,000, and the for the sample is $2000. Estimate; 

a. The average monthly shopping budget for the town 

b. The variance of the sample mean. 


c. The total monthly shopping budget for the town 

d. The variance of the estimate in part c. 

7. A director of student advisement is making a study of why students 
withdraw from the university after the freshman year. From the 
many students who have withdrawn over the past 10 ye^ Ihe 
selects some at random and obtains 26 who say they ^dropped 
because of grade dtfficulties. The 26 students have an averagrgrrde 
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point average of 2.137 and an of0.2638. What is the estimate of the 
grade point average for those students who dropped out because of 
grade difficulties? What is your estimate of the variance of this 
estimate? If the first-year grade point average for those who finish is 
2.417, do you think that the data support the reason given for 
withdrawing? 

8 . Draw a random sample of two-digit numbers of size 15 from Table 
A.l. Then calculate the sample mean and estimate the standard 
deviation of the sample mean. Are the results consistent with a 
population mean of 49.5 (the mean of the population being 
simulated by the random number table)? 

9. In a population some of the xs are one and the others are zero. If Np 
of the xs are one and N(\ — p) = Nq are zero, show that 

X (-V, - pV = 'Z - Np^ = Np{\ - p) 

1 1 

= Npq 

10. A survey is conducted to estimate the fraction of people favoring a 
certain proposition. Let x for an individual be one if she favors the 
proposition, zero if she opposes it. Then show that the sample 
fraction favoring the proposition, p, is equal to x. Also show that the 
variance of p is 

N — n pq 
N - \ H 
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14.1 


14.2 



THE 

CENTRAL LIMIT 
THEOREM 


There is a theorem considered by many to be the most important 
theorem in statistics. It is concerned with the distribution of a sum (or 
mean) of random variables. As the number of random variables 
increases, the distribution of the sum (or mean) approaches a normal 
distribution as a limit. It seems natural to call the theorem a limit 
theorem. Because of its importance or centrality to statistics, it seems 

natural to call it the central limit theorem, the name introduced bv Polva 
[3] in 1920. 


As intimated when we discussed the normal distribution, a vast body of 
statistical methods exists for the analysis of data which are normally 
distributed. In fact, many sets of observations found in nature are well 
described by normal distributions. In addition, many analyses are 

. I . 3rc sums (or mcuns) of other obscr- 

vations which may not be normally distributed. Let us cite two examples. 


1. In educational research the basic data are often obtained from 
responses to questionnaires. The response to each item on the 
questionnaire may be one of 5 possible responses, —2, — 1, 0, 1 2 
The Item responses are then highly discrete and the distribution of 
responses nonnormal. However, we may be interested in a total score 
for each respondent and the total score may be a sum of as many as 

50 itern responses. For the distribution of total scores, normality's a 
very plausible assumption. ^ 

2. 1^™ life testing of electronic devices the time to failure (operating time 
before fatiure) of individual devices is recorded. Such failure times 
typically have nonnormal distributions. However, the total (or 


/S9 
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average) lime to failure for a sample of several devices will have a 
much more symmetric distribution and, for moderately large 

samples, the normality assumption for the total (or mean) will be 
reasonable. 


14.3 Prior to De Moivre [2] there was no normal distribution and, naturally, 
no explicit recognition of approximate normality. The step that De 
Moivre made was simultaneously a step toward formulation of the 
normal distribution and a step toward formulation of the central limit 
theorem. De Moivre obtained what we would describe in modern 
terminology as a normal approximation to the binomial. 


14.4 As an empirical introduction to the central limit theorem from a 
viewpoint similar to that of De Moivre, let us consider the distribution of 
the binomial variable X as we allow n to increase, holding p constant. 
From Table A.2. we extract the following probabilities for p = j (see 
Table 14.1). To see the cfTect as n increases from 1 to 5, consider the 
pictures in Figure 14.1. As n increases from 1 to 5, we see three 
characteristics of the normal distribution beginning to emerge. 

1. Probability is being distributed over many points on the number line. 

2. The distribution of probability is becoming more symmetric. 

3. The characteristic bell-like shape is hinted at with n = 3 and is 
unmistakeable with n = 4 and n = 5. 

As suggested by this example, the convergence to normality as n becomes 
infinitely large is incredibly rapid, so that with very small n, the 
distribution is beginning to become normal. 


Table 14.1 Binomial Probabilities (p = ]) 


X 

n = I 

n = 2 

n = 3 

n = 4 

n = 5 

0 

0.6667 

0.4444 

0.2963 

0.1975 

0.1317 

1 

0.3333 

0.4444 

0.4444 

0.3951 

0.3292 

2 


0.1111 

0.2222 

0.2963 

0.3292 

3 



0.0370 

0.0988 

0.1646 

4 




0.0123 

0.0412 

5 





0.0041 
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14.5 Why does the binomial distribution of X approach a normal distri¬ 
bution as n becomes infinite? In order to appeal to the central limit 
theorem, we need to recognize that X, the number of successes in n trials, 
is the sum of n variables. In fact, 

X = number of successes on first trial 
+ number of successes on second trial 

+ number of successes on nth trial 


14.6 In order to understand the central limit theorem, it is important to follow 
the process of sampling from a parent population. Suppose the original 
population is uniform, as shown in Table 14.2. Next, let us consider the 
joint distribution of X , and X 2 . the observations in a random sample of 
size 2. The joint distribution of Xi and 2^2 is shown in Table 14.3. 
The numbers in parentheses give the sum, X, X 2 . The probability for 

any value of the sum is then given by adding the probabilities on the 


Table 14.2 
Original Population 
Distribution 


X 

P(x) 

1 

t 

4 

2 

1 

4 

3 

1 

4 

4 

1 

4 


Table 14.3 Joint Distribution of X , and X 2 




P{X2) 

1 

4 

1 

4 

1 

4 

1 

4 

P(.v,) 


Xl 

1 

2 

3 

4 


1 


iV(2) 

iV(3) 

,V(4) 

^6(5) 

4 

1 

2 


1*6(3) 

,V(4) 

,V(5) 

,V(6) 

4 

3 


1*6 (^) 

,V(5) 

,V(6) 

1*6 (7) 

4 

1 

4 

4 


,’e (5) 

,’e (6) 

1*6 (7) 

1*6 (8) 
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Table 14.4 Probability 
Distribution for X, + 


X, + Xj P(X, + Xj) 


2 

1 

16 

3 

2 

16 

4 

16 

5 

4. 

16 

6 

3 

16 

7 

16 

8 

16 


appropriate diagonal. For example, P(A:, + X, = 4) = -j^. Proceeding 

in this way, we obtain the probability distribution for X^ + X 2 (see 
Table 14.4) 

Asa next slep.consider the joint distribution of X, + X, and in a 

random sample of size 3, as shown in Table 14.5. The numbers in 
parentheses are values for the sum X, + X 2 + Xy and, as before, the 

probability for any value of the sum is given by adding the probabilities 
on the appropriate diagonal. Thus, 


In this way we obtain the probability distribution for A", + X + X 
(see Table 14.6). ' 

In an analogous manner we can obtain the probability distribution of 
Ai + ^2 + ^3 + A 4 , as shown in Tables 14.7 and 14.8. 


Table 14.5 Joint Distribution of A, + Aj and Aj 




P(X3) 

1 

4 

L 

1 


1 

P(X, + Y2) 

X, + Xj 

X 3 

1 

2 

4 

3 


4 

4 

1 

16 

2 

2 


i(3) 

i(4) 

61.(5) 

1 

64 

(6) 

16 

3 

3 

A 


^(4) 

^(5) 

^(6) 

2 

64 

(7) 

16 

4 

4 

c 


m(5) 

^(6) 

i(7) 

3 

64 

(8) 

16 

3 

0 


i(6) 

i(7) 

.i(8) 

4 

64 

(9) 

16 

2 

D 

■7 


^(7) 

^(8) 

^(9) 

3 

64 

(10) 

16 

1 

/ 

0 


1 (8) 

^(9) 

.idO) 

2 

64 

(11) 

16 

0 


M (^) 

i(10) 

i(11) 

1 

64 

(12) 
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I 



Table 14.6 Probability 
Distribution for A, + A 2 + Aj 


-V, + .V2 + Xj P(X^ + .V2 + -Yj) 


3 

4 

5 

6 

7 

8 
9 

10 

11 

12 


1 

64 

3 

64 

6 

64 

10 

64 

12 

64 

12 

64 

10 

64 

6 

64 

3 

64 

1 

64 


The probability distributions for the sample total are plotted in Figure 
14.2. As with the binomial distribution, it is startling that the bell-shaped 
phenomenon begins to occur with extremely small samples. With large 
samples from this original uniform distribution, the sum can, for all 
practical purposes, be regarded as normal. 


Table 14.7 Joint Distribution of A, + A 2 + A 3 and A^ 




P(-vJ 

1 

4 

1 

4 

1 

4 

1 

4 

IV) 


<4 

1 

2 

3 

4 

1 

r>4 

3 


.-^6(4) 

2^6(5) 

2 "^ (6) 

24-6(7) 

3 

6 4 

4 



2-16(6) 

2^6(7) 

2-46(8) 

6 

r>4 

5 


2^6(6) 

2i6(7) 

2^6(8) 

2-|-6(9) 

1 0 

G4 

6 


A“6(7) 

2^(8) 

2*6‘’6(9) 

MOO) 

1 '2 

6 4 

7 


2^6(8) 

2’5^(9) 

2^^(10) 

2^6(11) 

1 2 

64 

8 


2'5"6(9) 

5’A(10) 

2^%(11) 

M02) 

1 0 

64 

9 


2'.‘'6(10) 

2’6°6(11) 

2*^(12) 

A^(13) 

6 

6 4 

10 


2§6(11) 

2-16 (12) 

2-S^(13) 

2-4-6(14) 

3 

64 

11 


2^6(12) 

5^(13) 

2-1^6(14) 

2^( 1 6) 

1 

6 4 

12 


2^(13) 

2^(14) 

2-^(1 6) 

^^(16) 


y = -Vj + \2 + -^'3' 


a 
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Table 14.8 Probability Distribution of 


X, + X 2 + X 3 -1- X 4 

P(X, + X 2 -I- X 3 + X 4 ) 

4 

256 

5 

256 

6 

_ia_ 

256 

7 

256 

8 

256 

9 

.4a_ 

256 

10 

.44_ 

256 

11 

A5L. 

256 

12 

,ai_ 

256 

13 

2^ 

256 

14 

256 

15 

256 

16 

256 


14.7 We have illustrated with two simple examples the primitive idea of the 
central limit theorem. Actually, there are many different versions of the 
central limit theorem. Adams [1] summarizes the important versions 
and gives a fascinating account of the development of the theorem. 


SUMMARY. A vast body of statistical theory and methods presup¬ 
poses a normal distribution. Often the raw data are obviously not well 
described by a normal distribution, but analyses are performed on sums 
or averages of observations. These sums and averages tend to be 
normally distributed. A well-known limit theorem asserts this to be the 
case and, because of the importance to statistics, it is called the central 
limit theorem. 

The limiting normal phenomenon can be observed by graphing the 
binomial distribution for increasing values of n or the Poisson distri¬ 
bution for increasing values of A. 

Not only does the distribution of a sum of random variables 
approach normality, but the convergence is also quite rapid. This is 
observed by obtaining the distribution of Af, A' 2 ,X, + ^2 4 - A '3 
X,+X2 + X^-\- X etc., when the observations are taken from a 
discrete, highly nonnormal distribution. Very quickly the distributions 
begin to acquire the symmetric bell shape. 


(• 



» 
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EXERCISES 

1. Extend Figure 14.1 by graphing the binomial probabilities for p = ^, 
n = 6,1, 8, 9, 10. 

2. Given the following probability distribution: 


X 

P(x) 

1 

4 

10 

2 

10 

3 

2 

10 

4 

_L 

10 


find the joint distribution of Xi and X 2 and from that find the 
probability distribution of X^+ X^. Next find the distribution of 
XI + X 2 + Xj and Xi + X 2 + X^ + X^,. Graph the distributions. 
Does the convergence to normality seem more or less rapid than 
with the uniform population of Section 14.6? Why is this 
reasonable? 

3. Find the probability distribution for the sum of a random sample of 
size 5, starting with the population distribution given in Table 14.2. 

4. A student asked to discuss the central limit theorem on an exam 
wrote, “As the number of samples increases, x becomes more nearly 
normal.” What is wrong with this answer? 

5. A sample of size 10 is taken from a large population so that (t| 
— O’ /n. If/i = 8 and o^ = 40, standardize the sample mean value of 
X = 12 by subtracting p and dividing by the standard deviation of x. 
Up is approximately normal, use the standardized value and Table 
A.3 to find the probability that x < 12. 
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6 . If the mean of a sample of size 16 from a population with /i = 15 and 

= 25 is approximately normal, find the following probabilities. 

a. P{x > 14). 

b. P(13 < X < 16). 

c. P(x < 14.5). 

7. A busy intersection is observed over a long period of time, and the 
number of arrivals at a red light is found to be a Poisson variable 
with parameter /. = 5.2. We are interested in the average number of 
arrivals (x) for 36 (n = 36) red lights. Use the central limit theorem 
and Table A.3 to find the probability that x exceeds 6.5. 

8 . A five-point rating scale is used on a questionnaire. If all of the 
responses —2, —1, 0, 1, and 2 are equally likely, find the mean 
response n and the variance of the population of responses. What 
would be the mean and variance of the average response, x, on a 
questionnaire with 50 questions? Find the probability that x > I. 

9. Instead of the five-point scale used in Exercise 8, use the five-point 
scale 1, 2, 3, 4, 5. What is the mean and variance if all scores are 
equally likely? What is the mean and variance of the average 
response, x, on a questionnaire with 50 questions? Find the 
probability that x > 3, 

10. As n becomes larger, the binomial probabilities are more closely 
approximated by normal distribution probabilities. For n = 100 
and p = I, find the probability that X = 50. If the histogram with 
classes — 0.5-0.5,0.5-1.5, etc., is used, this probability is given by the 
area of the bar of unit width centered on the class mark 50. 
Approximate the binomial probability by the area under the normal 
curve from 49.5 to 50.5 with p = 50(np) and variance 25(npq). 

11. For large n the binomial distribution is well approximated by the 
Poisson and normal distributions. Therefore the Poisson distri¬ 
bution with large X should be well approximated by the normal 
distribution. Graph the Poisson probability distribution for X = 10, 
20,...,50. 

12. For / = 20, represent the Poisson distribution with a histogram 
using class marks 1, 2, 3,.... Approximate the area of the bar with 
class mark 20 by the area under the normal curve from 19.5 to 20.5 
using p = 2 = 20 and = X = 20. 

13. Draw 150 random samples of five one-digit numbers from Table A. 1. 
For each sample calculate x. Construct the frequency distribution 
and construct a histogram for your samples. Note that the 
distribution of x is much more normal than the distribution of 
individual xs. 
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14. An instrument is used in an educational research experiment that 
consists of binary responses (yes-no, true-false, etc.) to 100 items. 
One of the responses is labeled zero and the other one. In a large 
population of equally likely zeros and ones, what is the mean and 
variance? If the responses on each item are equally likely, what is the 
mean and variance of the average response on 100 items? Using the 
central limit theorem, what is the probability that x > .75? 

15. The graduate committee of a statistics department claims that its 
new students are definitely better than average. The national mean 
and standard deviation on the Graduate Record Examination 
analytical score are known to be 583 and 117, respectively. The 20 
new students admitted to the department have an average analytical 
score of 624. If the 20 students were a random sample from the 
population, what is the probability of an average score as great as 
624? 

16. After several dry years in a certain city, people are predicting a major 
shift in weather patterns. The annual rainfall for the past 50 years has 
a mean of 42.36 inches and a standard deviation of 6.11 inches. The 
average rainfall for the last 5 years has been 35.72 inches. What is the 
probability of a sample mean this small using a sample of size 5 from 
a population with mean 42.36 and standard deviation 35.72? Does 

this probability cast doubt on the hypothesis of a major weather 
change? 




ST. JAMES'S GATE BREWERY. W. S. Cosset went to work here in 1899. 


Arthur Guinness Son and Company, Ltd. 




“STUDENT’S” t 


measured weekly and, at the end of the trial period, the increase in 
growth over the untreated hooves is recorded for each horse. The 
growth (in inches) for the 14 horses is as follows: 1.1, - 1.0,2.2,1.4,0.6, 
0.1,0.3,0.4,1.0,0.7, —0.6, — 0.2,0.4, and 0.5. Calculate the r value for a 
/i of zero. Find the probability of t being this large in absolute 
magnitude. Do you feel that the data support the claim that the 
product stimulates hoof growth? 


OPINIONS, 

CONCLUSIONS, 

AND 

DECISIONS 



W. S. GOSSET. 1876-1936. The Granger Collection. 
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host of other propositions, many not explicitly stated or p>erceived. In 
fact, most propositions are justified by an argument that proceeds from a 

given set of statements called premises to a concluding statement called a 
conclusion. 


16.3 The process of reaching a conclusion from stated premises is often called 

inference and, for the sake of simplicity, the study of inference usually 
denotes two types: deductive inference (deduction) and inductive in¬ 
ference (induction). The word inference is also often used to denote 
inductive inference only. Following Skyrms [1], we will say that a 
statement is a factual claim, an argument is a list of statements beginning 
with premises and ending with a conclusion, and logic is the study of the 
evidential link between the premises and the conclusion. 

To illustrate the idea of a deductive argument, consider the following. 

Example 1 : Deductive Argument. 

Premises: John’s birthday is in the month of September. 

It rained on John’s last birthday. 

Conclusion: It rained in September. 

If we accept the premises in Example 1 as being true, we must accept the 
conclusion as being true. There is no escape from this. The conclusion 
follows with certainty from the premises. This argument illustrates the 
idea of a deductively valid argument. In general, an argument is said to be 
valid if, when the premises are true, the conclusion must be true. It 
should be noted that validity is not a measure of the certainty with which 
the conclusion follows from the premises. 

Example 2: Deductive Argument. 

Premises: John’s birthday is in the month of Octaroon. 

It rained on John’s last birthday. 

Conclusion: It rained in Octaroon. 

The argument in Example 2 is deductively valid. Is it true? No one 
knows. But, given that the premises are true, the conclusion follows with 
absolute certainty. Let us now consider an argument that is not 
deductively valid. 

Example 3: Inductive Argument. 

Premises: Last September was the rainiest on record. 

John’s birthday is in September. 

Conclusion: It rained on John’s last birthday. 
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The conclusion in Example 3 does not follow with certainty given that 
the premises are true and is not, therefore, deductively valid. It is, 
however, quite plausible and, in some parts of the country, would be 
quite likely. In any locale where the rainfall is very low it would be quite 
unlikely. This example does serve as an illustration of an inductive 
argument. A basic problem in inductive inference is to devise ways of 
measuring the strength of an inductive argument. 

Most approaches to measuring the strength of an inductive argument 
have used probability in some way. Readers interested in such ap¬ 
proaches should be reminded at this point of the two divergent 
interpretations of probability: probability as a measure of belief and 
probability as a physical constant in some system. To those embracing 
the first interpretation, it has seemed only natural to define the strength 
of an inductive argument as 

P(conclusion true | premises true) 

To those embracing the second interpretation of probability, it has 
seemed just as natural to reject this but to devise other measures that 
involve probability. In statistics, these two approaches have led to 
Bayesian and non-Bayesian statistics, respectively. 


16.4 It should be quite clear that a large part of statistics is concerned with the 
process, called statistical inference, of drawing conclusions about the 
population on the basis of information obtained from a sample from the 
same population. It should also be clear at this point that such a process 
is a process in inductive inference. From this point of view statistical 
inference lies in the center of all experimental science. In anticipation of 
lectures to follow, we now want to give an overview of some of the main 
topics of statistical inference. 


16.5 Many problems in statistical inference arc motivated by real problems in 
science. The statistical methods used are based on research that has 
assumed that the sample has come from a population well described by 
some particular model. For example, suppose we are interested in the 
relative effectiveness of two weight-reducing diets and we have data on 
weight loss from a sample of people for each of the diets. We may assume 
that the samples of weight losses are random samples from two 
populations of weight losses of people and that the two populations are 
well described by two normal populations with means /i, and //,, 
respectively. Our methods are then based on a mathematical model of a 
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random sample from each of two normal populations. It should be 
apparent that the choice of a model is a very important step. What is the 

process by which we choose a model or conclude that a model is highly 
plausible? 

The choice of model is itself an exercise in inductive inference and, if the 
choice is based on data, it is an exercise in statistical inference. An array 
of statistical methods has been devised to assist in the choice of a model; 
they range from simple graphical procedures to formal procedures called 
goodness-of-fit tests. Goodness-of-fit tests are so named because they 
give the user help in judging how well a population is described (or fitted, 
like a suit of clothes) by a given model. 


16.6 Once a model has been accepted as reasonable, it is natural to focus 
attention on the unknown constants, called parameters, that appear in 
the model. With a normal distribution we are concerned with /i and 
with a Poisson distribution with A, etc. Formal procedures (called tests) 
have been devised for judging how well hypothesized values of the 
parameters are supported by the sample data. The tests are usually 
called tests of significance and (or) tests of hypotheses. 

16.7 Quite often, particularly when the parameters of the mathematical 
model have a physical interpretation, it will be necessary to specify or 
guess a value for a parameter on the basis of the sample data. For 
example, if of a normal distribution represents the average family 
income for a community, it may be necessary to make a guess for the 
value of pL, based on a random sample of family incomes. Guesses of 
parameter values that are based on sample data are called estimates. 
Estimates should be obtained from procedures that have known 
properties and that have been tried and proven in applications. 

Obviously, any estimated (or guessed) value for a parameter has an 
uncertainty attached to it, and one way of describing such uncertainty is 
by giving an interval of values for a parameter. Various methods of 
obtaining intervals have been studied; the best-known intervals are 
confidence intervals and Bayesian intervals. 


16.8 We have given an overview of some of the main branches of statistical 
inference here. We have indicated that statistical inference lies at the 
heart of experimental science. It is worth noting that statistical inference 
rests on the assumption of a mathematical model. Without knowledge of 
such models, statistical inference is a fairly narrow subject. 
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SUMMARY. Logic is the study of the link between statements that 
strengthens the justification of some when others are true. The process of 
reaching a conclusion from a given set of statements called premises is 
often called inference. Inference falls into two broad classes: deductive 
and inductive. Deductive inference is sometimes described as reasoning 
from the general to the particular, while inductive inference is from the 
particular to the general. With deductive inference the concern is with 
the validity of the argument. With inductive inference the concern is with 
the plausibility of the conclusion given that the premises are true. 

Statistical inference is concerned with drawing conclusions about the 
population on the basis of a sample. One of the first problems is the 
choice of a mathematical model for the random sample. Usually the next 
problem is the choice of parameter values in the model chosen. Bayesian 
statistical methods are directed toward finding probability statements 
about the parameters. Non-Bayesian methods, using a different interpre¬ 
tation of probability, are concerned with estimates of the parameters, 

tests for hypothesized parameter values, and intervals of reasonable 
parameter values. 


1. Skyrms, Brian. 
Dickenson. 
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EXERCISES 

1. In your daily newspaper note examples of arguments that are 
deductively valid and of arguments that are inductively strong. 

2 . Note important arguments from your major field that are not 
deductively valid but that are inductively strong. 

argument that concludes that ^ 10 
given the premises that .x = 15, a = 6, and n = 36? How would you 
measure the strength of the argument? 

4. How would you classify the argument that /i lies between "»5 md ^0 
given the premises .ha, ,v = 27.5, a = 4, and= 2," and'he'salt ,s 
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5. A company has discriminatory practices against women as a group in 
hiring and promotion of employees. Theo is a woman who formerly 
worked for the company. The company has discriminated against 
Theo. Is the previous argument a deductively valid argument? Is it 
inductively plausible? 

6. A spacecraft is programmed to return to earth intact. The chance of its 
striking human beings is calculated to be less than 1 in 1 million. 
Therefore it is quite unlikely that it will land in any metropolitan area. 
Is this a deductively valid argument? Is it inductively plausible? 

7. In June, before announcement of candidacy, a particular candidate is 
preferred by 10% of the people polled. In November, after the 
announcement, 12% of the people polled prefer the candidate, who 
claims to have gained support. Is the claim justified? How would you 
measure the strength of the claim? 

8. A coffee manufacturer claims that its coffee yields more cups to the 
pound than a competitive brand, which advertises 250 cups per 
pound. Ten pounds are purchased by the coffee club of a department; 
after using the first pound to determine the amount required for the 
proper taste, the number of cups per pound is recorded for the last 9 
pounds, giving 248, 254, 256, 253, 248, 250, 252, 246, and 256. 
Calculate the t value using a /i value of 250. What is the probability of 
a t this large or larger? Do the data support the idea that the new 
brand gives more than 250 cups per pound? 





JERZY NEYMAN. University of California, Public Information. 
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STATISTICAL 

TESTS 


7.1 Vou will see somethinc) new. 

Two thincjs. And / call them 
Thing One and Thing Two.* 

The Cat in the Hat hy Dr. Seuss 

The first occurrence of a statistical test is uncertain, but an early 
example is given by Arbuthnot [ I ]. He considered the period of 82 years 
for which records were available on the number of males and females 
born in London and noted that in each year males exceeded females. He 
noted that under a “chance” hypothesis, the probability of such an 
outcome would be (4)*^. 

A second example of a statistical test is given by Stigler [5] who 
describes the trial of the Pyx, a test that has been used by the Royal Mint 
in London for about 800 years. During these years, coins have been 
taken from those minted and placed in a box called the Pyx. At specified 
times the box would be opened and the contents counted, weighed, and 
assayed for the official purpose of determining that the coinage issued by 
the Royal Mint met specifications. The antiquity of the trial of the Pyx is 
well documented, and Stigler offers a good bibliography. Before the 
twentieth century certain statistical tests were known and used (e.g., the 
chi-square test of Pearson [4]), but only in the present century were 
statistical tests fully developed. The early papers of “Student” gave an 
impetus to the development of a theory of tests, but Fisher, in a series of 
papers from 1915 onward, gave statistical testing a great push forward. 
The next major thrust began with the publication in 1933 of the 
important paper by Neyman and Pearson [3]. 

• Reprinted by permission of the publishers, Random House, from The Cat in the Hat by 
Dr. Seuss. Copyright 1957 by Random House. 
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17.2 As conceived by Fisher [2, p. 13], a test of significance consisted of the 
following items. 

1 . A hypothesized value for a parameter of an assumed probability 
distribution. 

2. A test statistic whose distribution is completely known when the 
hypothesized value is the true parameter value. 

3. A random sample from the population. 

4. Calculation of the statistic. 

5. Calculation of the probability of the test statistic being as extreme as 
actually observed when the hypothesis is true. 

The probability calculated in item 5 is called the significance level or 
observed significance level. If it is small we have the logical disjunction: 
either the hypothesis is false or the hypothesis is true and an event with 
low probability has occurred. 

Example 1 . A nonprescription remedy for cough and nasal congestion 
indicates on the label that it is 5% alcohol. A consumer believes that the 
amount of alcohol varies from bottle to bottle but that the average 
amount in all bottles produced may be in the neighborhood of 5%. He 
buys 10 bottles and performs a laboratory analysis on each bottle to find 
the amount of alcohol. The results in percent are 5.01, 4.87, 5.11, 5.21, 
5.03, 4.96, 4.78, 4.98, 4.88, and 5.06. If it is believed that the amount of 
alcohol is normally distributed, do the data support the hypothesized 
mean value of 5%? 

1. Hypothesis, /i = 5 

2. Test Statistic. 

X — 5 

Note that if /i = 5, the distribution of this test statistic is completely 
known and tabulated (“Student’s” t with 9 degrees of freedom). 

3. Random Sample. Ten numbers given by the laboratory analyses. 

4. Calculation of Test Statistic. 

Ex = 49.89 
X = 4.989 
Zx^ = 249.0425 
I(x-x)^ = 0.14129 
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= 0.0156989 
s= = 0.00156989 
s. = 0.0396 

4.989 - 5 
^ 0.0396 

= -0.2778 

5. Calculation of Observed Significance Level. From Table A.5 we see 
that the probability of a t value this extreme (as large as + 0.2778 or as 
small as -0.2778) is about .80. This probability is very large and 
suggests that a mean value of 5% is quite consistent with the data. If, 
on the other hand, we had obtained a very small probability, we 
would have been forced to conclude that either the 5% mean was false 
or that an event with low probability had occurred. 

Note that the observed significance level does not give the probability 
that the null hypothesis is true. It is not a probability statement of any 
kind about the parameter (or parameters). Consider the following 
analogy. An accused person on trial pleads “guilty,” and experience 
indicates that only 10% of innocent prisoners plead guilty. Then, given 
the guilty plea, we feel that the prisoner is either guilty or innocent and an 
event with probability has occurred. However, this is not the same as 
saying that the probability of innocence is To state the matter in 
symbols, 

P(“not guilty” plea | innocent) = .90 

^ P(innocent) 

The observed significance level is reported in many research journals 
and is often labeled simply the P value. In addition, many packaged 
statistical programs calculate the observed significance level routinely. 

17.3 In 1933 Neyman and Pearson published a paper on statistical tests [3]. 
In subsequent years they and their followers developed the theory of 
tests in a manner similar to but more formal than that of Fisher. As 
developed by Neyman and Pearson, a statistical test is a decision 
procedure for accepting one of two possible hypotheses about a 
parameter. One of the hypotheses is called a null hypothesis and the other 
an alternative hypothesis. If one then uses a procedure to accept or reject 

the null hypothesis, there are four possible contingencies, as indicated in 
Table 17.1. 
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Table 17.1 

Possibilities in Testing 


Decision 


Accept 

Reject 

State of Nature 

Null Hypothesis Null Hypothesis 

Null hypothesis 



is true 

No error 

Type I error 

Null hypothesis 



is false 

Type II error 

No error 


If we reject the null hypothesis when it is false or accept it when it is true, 
no error in Judgment occurs; otherwise, we commit one of two possible 
errors. These are called Type I error and Type II error. The error of 
rejecting the null hypothesis when it is true is a Type I error, and the error 
of accepting the null hypothesis when it is false is a Type II error. 

Naturally, we would like to eliminate both errors, but this is 
impossible. Instead, we are forced to settle for controlling the probability 
of such errors. We now introduce a little more terminology of Neyman- 
Pearson hypothesis testing. 

a = probability of Type 1 error 
= significance level 
P = probabilityofType II error 
I — P = power of the test 

We would like both a and P to be small, but they seem to have minds of 
their own. When we make a smaller, P tends to become larger; when we 
make p smaller, a tends to become larger. The simple procedure usually 
adopted is to fix a arbitrarily and to make P acceptably small by 
increasing the sample size. 

Using the trial-by-Jury analogy, we might take the null hypothesis to 
be “not guilty.” Under Anglo-Saxon law and traditions, the probability 
of a Type I error is kept very small, with the consequence that the 
probability of a Type II error is sometimes larger than would be desired. 

The components of a test following the Neyman-Pearson procedure 

are as follows. 


STATISTICAL TESTS 


1. Null and alternative hypotheses for a parameter of an assumed 
probability distribution. 

2. a and or a and n. 

3. A test statistic. 

4. A decision rule. 

5. A random sample from the population. 

6. Calculation of the test statistic. 

7. A decision either to accept the null hypothesis or to reject the null 
hypothesis obtained from the calculated value of the test statistic and 
the decision rule. 

Example 2. A manufacturer of a type of cracker indicates on each box 
that the box contains 269 grams. He is anxious not to ship boxes that 
have less weight than this and the quality control department designs a 
test procedure with the following components, assuming that the weight 
is a normal variable: 

1. Null Hypothesis. ^ = 269. 

Alternative Hypothesis, n < 269. 

2. a = 0.05 

n is set at n = 30 after some consultation. 

3. Test Statistic. 


jc-269 

t = - 

4. Decision Rule. Reject the null hypothesis if r < — 1.699. 

5. Random Sample. Thirty boxes are selected at random from the 
production line, resulting in a mean x = 268 and s = 1.8. 

6. Calculation of the Test Statistic. 

s 1.8 

Si = —7= = -7= = 0.3286 

s/n 

_ X - 269 - 1 

Si “ 0.3268 

= -3.06 

7. Because the calculated t is less than -1.699, we reject the null 
hypothesis and conclude that the production line is not meeting the 
advertised claim of 269 grams per box on the average. 
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1 7.4 In many cases a statistical test is used more or less to assess the data and 
not to reach any firm decision. This seems to have been the attitude of 
Fisher, who was thinking of research situations. In other cases the 
background of the problem demands a clear decision. This sort of 
situation seems to be better handled with some sort of decision rule, such 
as the Neyman-Pearson approach. We will discuss the matter further in 
Lecture 33. 


SUMMARY. Traces of statistical tests are very old. One of the earliest 
is the trial of the Pyx in London, a test applied to coins by the Royal Mint 
for over 800 years. However, in the twentieth century, statistical tests 
have been studied, refined, and used widely. Theoretical development 
began with the work of Pearson in 1900 and the extremely important 
work of “Student” in 1908. Fisher contributed heavily to the develop¬ 
ment of tests from 1915 on. 

Given a hypothesized value for a parameter, a test statistic is calculated 
from a random sample of observations. The observed significance level is 
the probability that a statistic as extreme as the one observed could have 
occurred by chance when the hypothesis is true. This value is a measure 

of plausibility of the hypothesized value. 

Neyman and Pearson formulated a test as a more formal accept-reject 
rule. This theory requires specification of both a null and alternative 
hypothesis. Rejecting the null hypothesis when true and accepting the 
alternative hypothesis when false are called Type I and Type II errors, 
respectively. Although these errors cannot be avoided, tests can be 
designed to control their probabilities. 


1 . 


2 . 

3. 
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EXERCISES 

1. Given a random sample of size 25 from a normal population with 
mean p = 10 and ct = 4, find 



2. If .X = 12 for the sample of Exercise 1, find 

^ 12-10 

^Observed 

What is P(Z > Zobserved)? 

3. We are given a random sample of size 25 from a normal population 
with unknown mean, p, and a = 4. It is hypothesized that the mean 
is 10, but there is some concern that it may be greater than 10. 
Calculate the observed significance level if x = 12. Do the data 
support the hypothesized value of p2 

4. We have designed a sampling inspection plan for a manufac¬ 
turing line producing 5-ohm resistors. We take a random sample 
of 10 resistors and calculate the sample average resistance 
X. If I X - 51 < .5, we accept the null hypothesis that p = 5. If 
IX — 51 > .5, we accept the alternative hypothesis that p ^ 5. 

a. If a = 4, calculate the level of significance; i.e., calculate 
P( IX - 51 > .5) given that p = 5 and <7 = 4. 

b. With ff = 4, calculate the probability of accepting the null 
hypothesis when /i = 6; i.e., calculate P( | x - 51 < .5) given that 
p = 6 and a = 4. 

5. Does a + P = 1 ? Why or why not? 

6. The keypunch operators in a computer center claim that at least 
90% of cards punched are free of errors. A sceptical user of the center 
selects 10 cards at random from a day’s output and finds 3 cards with 
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errors. Calculate the observed significance level for the null hypo¬ 
thesis that p < .10 versus the alternative hypothesis that p > .10, 
where p is the fraction of cards with errors. 

7. A larger study than in Exercise 6 is made of the computer center. One 
hundred cards are selected at random, and 15 cards are found with 
errors. Using the central limit theorem, find the probability that Z 
> -^observed where 


Z 


observed 


15 - lOOp 

v/lOOpd -p) 


and where p = . 10. Do the data support the computer center claim? 

8. If male and female births were equally likely, the Arbuthnot data of 
Section 17.1 would be very surprising. Consider a less surprising set 
of data. Suppose that in 82 years male births exceeded female births 
in only 62 years. Calculate the observed significance level for the null 
hypothesis that p = ^ versus the alternative hypothesis that p ^ 
Use the Z statistic of Exercise 7 with p = j and find P(|Z| 

^ ^observed!’ 


9. M uffins baked from flour containing 15% peanut flour are given to a 
taste panel of 16 people. Each muffin is given a score on a five-point 
rating scale on each of 10 characteristics. Then the average score 
over all 10 characteristics is recorded for each muffin. The 16 scores 
are 3.1, 3.2, 3.1,2.8, 2.9, 2.6, 3.0, 3.2, 3.7, 3.9, 3.3, 2.7, 3.2, 2.4, 2.8, and 
3.3. Using a = .05, test the null hypothesis that p = 3.0 (wheat flour) 
versus the alternative hypothesis that p # 3.0. Accept or reject the 
null hypothesis. 

10. A particular bank offers several different accounts and assesses no 
charge on a checking account with a S500 average daily balance. 
Since no interest is accrued on checking account balances, the 
supposition is that customers try to maintain small balances as close 
to the required balance as possible. A quick check of checking 
account balances gives the following 10 average daily balances: 491, 
523,486,494, 532, 564,488,422,562, and 541. Using a = .05, test the 
null hypothesis that p = 500 against the alternative hypothesis that 
p > 500. Accept or reject the null hypothesis. 

11. Student evaluations are made on every course at a certain 

• • 

university. In a large department the average instructor rating is 
about 3.25 year after year. In 20 upper-division sections the ratings 
are 3.05, 3.31, 3.34, 3.82, 3.30, 3.16, 2.84, 3.10, 2.90, 3.18, 2.88, 3.22, 
3.28,3.34, 3.62.3.28,3.30,3.22,3.54, and 3.30. Calculate the observed 
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significance level for the hypothesis that /i = 3.25 against the 
alternative that > 3.25. Do the data give strong support to the idea 
that upper-division ratings are kinder to the instructors? 

12. The people in a neighborhood made a count of the number of 
children coming to their houses for trick or treat on Halloween 
night. They believed that some children were coming from across 
town to their neighborhood. The numbers counted were 88, 112,96, 
92,93,98,89,101, 86,94,95,68,72,102,94, 81,84,85,92, 87, 76, and 
89. Calculate the observed significance level for a mean of 84 against 
a hypothesis of a greater mean. 

13. The state coordinating board for higher education in a certain state 
tries to monitor enrollment trends. The changes in enrollment from 
the previous year are reported by 24 colleges. The changes are 224, 
208, -111,362, 15, -76, -282, -106, 104, 172, -108, -222, 500, 
46,215, 178, 337,206, 79,42, 39, 162, - 317, and - 226. Do the data 
indicate a decrease or an increase in the true enrollment? Use a t test 
to test the null hypothesis of a zero mean versus /i 0. Use a = 0.5. 

14. An entomology researcher is constructing a mathematical model to 
describe the impact of boll weevils on cotton plants. She has 
measured the number of blooms per cotton plant on each day of the 
growing season. On one particular day the numbers are 42, 26, 15, 
19, 27, 32, 26, 24,17,19, 22, 18, 24, 16, 14, 30, 22, 25, 27, and 29. She 
wishes to use 20 as the mean number of blooms on this particular 
day. Is this a reasonable number? Base your answer on a t test. 

15. A horticulturist has conducted a greenhouse experiment to study the 
effect of watering rates and feeding rates on chrysanthemum plants. 
At the end of the season, he measures the diameter of the largest 
bloom on each plant. The diameters, in inches, are 6^, 6^, 5|, 8^, 1^, 

7-i^, 5j|, 61, 6i, 6^, 7i, 8^, and 1\. Calculate the observed 
significance level for the hypothesis that the mean diameter of the 
largest bloom is 6 inches against the alternative of a larger diameter. 




E. S. PEARSON. 1895-1980. Keystone Press Agency. 
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ESTIMATION 


18.1 Tuppence patiently, cautiously, trustingly invested in the, to he specijic, 
in the Dawes-Toomes-Mousley-Grooms Fidelity Fiduciary Bank.* 

Mary Poppins 

Although Jane and Michael Banks were not convinced by Mr. Dawes 
of the desirability of investing their tuppence in the bank, millions of 
people do place their trust and confidence in banks and other fiduciary 
establishments. The words fiducial and fiduciary derive from the Latin 
word fiducia, for trust. In this lecture we discuss the estimation of 
parameters by methods that we consider to be trustworthy. 

18.2 In an earlier section we referred to point estimation as the process of 
calculating specific values from sample data to be used for values of 
parameters occurring in the population probability distribution. The 
development of a theory of point estimation began essentially with the 
work of Fisher in the early 1920s. In two major papers Fisher [1, 2] 
clearly stated the problem of estimation and developed several methods 
of measuring the goodness of an estimate. Phrased another way, he 
developed several methods of measuring our faith, or confidence, in an 
estimate. We will discuss only two properties of estimates: unbiasedness 
and minimum variance. 

18.3 In an earlier section we demonstrated that the mean of the sampling 
distribution of x is the mean of the parent population, p. This property of 
X is underscored by saying that x is an unbiased estimate of p. As used in 

•“Fidelity Fiduciary Bank," copyright ^ 1963 Wonderland Music Company. Inc. 
Words and Music by Richard M. and Robert B. Sherman. 
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this technical sense, unbiasedness has nothing to do with fairness or lack 
of prejudice. It simply means, as already stated, that the average value of 
all possible xs is the value of /i, which we are estimating, or, in symbols, 

Ave(jc) = /i 

Another way of verbalizing this property is to say that x will give us the 
correct value for /i on the average. 

In more general notation, we will say that 0 (theta hat) is an unbiased 
estimate of 6 if the average value of all possible values of 9 is equal to 9 or 
if, in symbols, 

Ave(^ = 6 

As another example of an unbiased estimate, consider the sample 
variance s^. We demonstrated earlier, in sampling from an infinite 
population, that the average value of all possible values of is equal to 
(T^, the variance of the parent population. So we say that is an unbiased 
estimate of To challenge students who are beginning to feel that they 
understand the idea, we would like to point out that s is not an unbiased 
estimate of cr. 

Suppose we wish to estimate the average length of a queue at a bus 
stop. We might select times at random during a week and count the 
number of people standing in line for the bus at these times; we would 
obtain the following data: 10,2,15,23,7,0,5,3,8,13. From these sample 
data, we wish to estimate the parameter 9, the average queue length 
during the week. How should we do this? I have asked this question of 
several classes of beginning students. Invariably, most popular answers 
are to estimate 9 either by 9i = sample mean = .8.6 or by 9i = 6.3 
(sample mean after deleting the suspiciously large number 23). The 
sample median, 9^ = 7.5, is also usually suggested. 

Without further assumptions about the frequency distribution of the 
population, we can claim unbiasedness for the sample niean, only. If 
we assume that the distribution is symmetric about 0, both 9^ and 9^ are 
unbiased. Although 02 seems to be a reasonable ad hoc way to estimate 
"0, it will generally be biased downward. 


18.4 Given several unbiased estimates of a parameter 0, we need some other 
criterion for comparing them and choosing between them. Consider the 
sampling distributions of 0, and 0^ given that the population distri- 
bution is symmetric about 0 (e.g., normal). We could depict t ese 
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sampling distributions as in Figure 18.1. Although both estimates are 
centered on the true value, the sampling distribution of 0^ is more 
concentrated and would be preferred. 


fm 



Given two or more unbiased estimates, an obvious way to compare 
them is by comparing their variances. The estimate with the smallest 
variance will generally have a sampling distribution that is most 
concentrated about 0 and is, therefore, preferred. An estimate with the 
smallest variance of all possible unbiased estimates is called a minimum 
variance unbiased estimate, or sometimes a best estimate. The word best 
should be used advisedly. Sometimes a novice may misinterpret it to 
mean that the estimate is best in all sorts of ways, other than having 
minimum variance among the unbiased estimates. 

The sample mean is undoubtedly the most common way of estimating 
the population mean. Regardless of the form of the population 
distribution, the sample mean is unbiased. Furthermore, if the popu¬ 
lation is normal, the sample mean is a minimum variance unbiased 
estimate. 

For the bus queue example, it is probably unlikely that the population 
is normal or even symmetric. Instead, the distribution is likely to be 
skewed to the right, with many small numbers and a few very large 
numbers. In such a case, the sample mean is unbiased but not minimum 
variance unbiased. Without further assumptions, we cannot say what 
the minimum variance unbiased estimate is. 
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18.5 A very natural way of expressing belief in an estimate is to give an interval 
within which we are reasonably certain that a population parameter lies. 
For example, engineers commonly add and subtract an error of 10% as 
an arbitrary safety factor. We now want to go through a simple 
probability argument and give several different interpretations that have 
flowed from it. Suppose that we have a random sample of size n from a 
normal population with mean n and variance with a known but n 
unknown. Now we can obtain from the standard normal table that 

P(-1.96<Z< 1.96) = .95 

Also, we know that 

X- 

ajjn 

is a standard normal variable, so 

p( 1.96 < < 1.96 I = .95 

V / 

Multiplying all terms inside the parentheses by a/y/n, we obtain 

P{— l.96a/y/n < x — ^ < \.96a/y/n) = .95 
Next, we subtract x from each term to obtain 

P( — x— \.96alyjn < —n< — x + \.96a/y/n) = .95 
Finally, we multiply all terms by - 1, reversing inequalities, to obtain 

P{x — l.96(T/y/n ^ + l.96a/^) = .95 

Thus we obtain the interval from x — \.9(iolyJn to x + 1.96<T/.yn as an 
estimate of fi. 

18.6 Fisher published a paper called “Inverse Probability” in 1930 [3] in 
which he introduced the idea of fiducial probability, which resulted in 
fiducial intervals. At about the same time, Neyman developed the idea of 
confidence intervals. The idea of confidence intervals first appeared in 
print in 1932 in a booklet by Pytkowski [5]. For the next 10 years o*" 
numerous books and articles described confidence intervals and 
intervals as being the same in concept but differently named. Gradually, 
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as Fisher and Neyman developed their ideas, it became clear that fiducial 
intervals and confidence intervals were quite different. 

The fiducial interval interpretation for the interval .x ± 1.96 <t/^ 
obtained in Section 18.5 can be summarized as follows. 

1. The probability is for the specific value of x obtained in the sample. 

2. The probability statement is made after the sample is taken and x is 
calculated. 

3. The probability is an expression of belief about the value of //, a fixed 
parameter. 

By way of contrast, the confidence interval interpretation for the interval 
is as follows. 

1. The probability is not for the specific value of x obtained in the 
sample. 

2. The probability statement is made before the sample is taken and x is 
calculated. 

3. The probability is the relative frequency with which intervals 
calculated using the formula will contain the parameter /a. 

It is clear, as far as this example is concerned, that the difference is a 
difference of interpretation. It results from the two different ways of 
thinking about probability that we have mentioned previously; prob¬ 
ability as a measure of belief and probability as a property of a physical 
system. 

Fisher’s concept of fiducial probability was not generally accepted. 
Instead, Neyman’s confidence interval theory became extremely well 
known and is very widely taught. 


18.7 In recent years Bayesian confidence intervals have been increasing in 
favor. As far as the example of Section 18.5 is concerned, the Bayesian 
interpretation is similar to the fiducial interpretation. The probability 
interpretation is that of degree of belief and 95% confidence for the 

interval on n from x - \.96a/^ to x -H 1.960/^^ is interpreted as a 
measure of the belief attached to the statement that /i lies in the interval. 
Although the interpretation of probability is similar for Bayesian 
intervals and fiducial intervals, the method of obtaining the intervals is 
quite different. Bayesian intervals are obtained by assigning prob¬ 
abilities to ^ before the sample is drawn and modifying these prob¬ 
abilities using Bayes’ theorem after the sample is drawn. 
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18.8 A slightly different way of obtaining intervals of values supported by the 
data is given by Kempthorne and Folks [4]. Proceeding from a test of a 
null hypothesis, the significance level, SL, is graphed versus possible 
hypothesized values for For example, suppose we draw a random 
sample of size n = 25 from a normal distribution with unknown mean n 
and variance = 16. With Hq: fi = (Iq versus # /io, we would 
evaluate the significance level by finding the probability of a standard 

normal variable exceeding ^{x — ^o)la, or, by 

Given results of the sample, we can evaluate the significance level for any 
specified sample. Suppose we obtained x = 10. Then for 

/io=IO, SL = P(|Z| > 0) = 1 

for 

/.o = 12, SL = P[|Z|>iyB(10- 12)/4|] 

= P(| ZI > 2.5) 

= .0142 

and for 

/^o = 9, SL = P[|Z| > iy^(10 - 9)/4|] 

= P(|Z| > 1.25) 

= .2112 

Proceeding in this way, we can plot the significance level versus Ho 
Figure 8.2. 

Given any significance level, we can obtain the set of /i values in 
agreement with (in consonance with) the data. These intervals were 
called consonance intervals by Kempthorne and Folks. 

The intervals obtained are exactly the same as confidence intervals, 
but the interpretation is different. In any case, the graph in Figure 18.2 is 
worth constructing. 


18.9 In this lecture we have presented some of the underlying concepts o 
point and interval estimation. The ideas may seem somewhat abstract to 
beginning statistics students who are not accustomed to considering 
alternative estimates. Yet it is evident that in matters of national concern 
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(air pollution, nuclear radiation, energy supply, etc.), many alternative 
estimates are feasible from the same set of data. We have tried to 
emphasize in this lecture that the goodness of an estimation procedure 
cannot be judged from its performance on a single set of data. Instead, 
the goodness of the procedure must be based on its properties within a 
broader context. With the relative frequency interpretation of prob¬ 
ability, the performance for many possible samples is studied. With a 
measure of belief interpretation, the performance is judged on the basis 
of a broader probability model. In both cases we must consider more 
than the single set of data at hand. 


SUMMARY. Point estimation is the process of calculating a specific 
value, called an estimate, from sample data to be used for the parameter 
value in a model. The goodness of an estimate cannot be judged from a 
single sample; it is judged from the behavior of the method of estimation 
in the totality of all possible samples. 

An unbiased estimate is obtained by a procedure such that the average 
of all possible values equals the parameter being estimated. In the class of 
all unbiased estimates, the one with the smallest variance is called the 
minimum variance unbiased estimate. 
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Intervals of plausible parameter values are sometimes called interval 
estimates. In the early 1930s Fisher and Neyman put forward in¬ 
dependent formulations of interval estimates. Fisher’s fiducial interval 
carried with it a probability statement about the parameter value for a 
specified sample. By contrast, Neyman’s confidence interval carried with 
it a probability statement about intervals for a specified parameter. 
Bayesian confidence intervals are similar in spirit to fiducial intervals but 
are obtained by the use of Bayes’ theorem. 
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EXERCISES 

1. Choose a random sample of the students in your class and obtain an 
unbiased estimate of the average height of all the students in the 
class. 

2. Let to be a tabulated “Student’s” t value such that 

P(^ — t^ct<to) = desired confidence 

For example, with 15 degrees of freedom and a desired confidence of 
0.95, to - 2.131. Starting with 

f»( ^ ) = desired confidence 

V VvA / 

go through steps analogous to those in Section 18.5 to obtain the 
confidence interval formula 

X ± tosljn 
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3. Obtain a 95% confidence interval for the mean class height using the 
data collected in Exercise 1. 

4. Gardeners are always interested in the average frost-free date for 
their locality. In a certain city, inspection of the weather records gave 
the last killing frost of the year for 20 years. 

March 10 April 15 April 13 April 22 March 28 

April 24 May 6 April 5 April 18 May 2 

April 5 April 9 March 26 April 14 April 17 

Assuming that the time of frost as measured from January 1 is 
normally distributed, obtain a 95% confidence interval for the 
average frost-free date. 

5. The Environmental Protection Agency has established maximum 
allowable increases in levels for air pollutants. In a certain city the 
maximum allowable 24-hour increase was 91 micrograms per cubic 
meter. Measurements at ten randomly selected sites in the city 
showed the following 24-hour increases: 89,92, 84,60, 76,94, 78, 79, 
74,81. Obtain a 95% confidence interval for the 24-hour increase. Do 
you think the city was in violation of the standard? 

6. This exercise illustrates the concept of unbiasedness. Consider the 
finite population 4, 2, 15, 1, 8. List all 10 of the possible samples of 
size three that can be obtained in sampling without replacement. For 
each possible sample, calculate the sample mean, the sample median 
and the sample variance s^. 

a. Show that the sample mean is an unbiased estimate of the 
population mean by showing that the average of the 10 xs equals 
the mean of the population. 

b. Similarly, show that the sample is an unbiased estimate of the 
population S^. 

c. Similarly, show that the sample median is not an unbiased 
estimate of the population mean. 

d. Show that the sample s is not an unbiased estimate of the 
population S. 

7. The utility company in a small town launched a campaign to help 
people conserve energy. Answering calls to conduct energy surveys 
of homes, a team from the utility company noted the thermostat 
settings. They were (°F) 78,69,72,70,67,68,69,71,66,68, 70,65, 71, 
67,68,72, 69, 73,64, and 70. Calculate a 90% confidence interval for 
the mean thermostat setting. 

8. A statewide tax reform question was being submitted to the voters 
and a local survey was conducted to get an indication of voter 
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preferences. Out of a random sample of 50 people, 29 favored the 
proposal and 21 opposed it. Estimate the fraction favoring the 
question and calculate an approximate 95% confidence interval for 
the fraction favoring the question. 

9. A tire manufacturer is interested in the slump in sales in a particular 
tire model and surveys people who have purchased tires of that 
model in the last 4 years. Those who used the tires until replacing 
them by another model were asked about the miles of service. The 

miles of service reported were: 

22,325 24,320 18,260 20,200 

24,350 25,000 19,700 22,500 

23,250 24,000 22,000 21,900 

Calculate a 95% confidence interval for the mean number of miles of 
service. 

10. A cabinetmaker is using a new glue in a critical support joint in a 
kitchen cabinet and conducts some tests to assure himself of the 
strength of the glue joint. He glues two pieces of wood together and 
then uses weights to break the glue joint. The weight (pounds) 
required to break 10 glue joints is 545, 495, 625, 530, 585, 615, 580, 
615, 590, and 540. Calculate a 95% confidence interval on the mean 
weight required to break the glue joints. 

11. Ten pages were selected at random from a city telephone directory. 
The number of names on these 10 pages were 399,412,388,422,396, 
420,410,406, 392, and 395. Calculate a 95% confidence interval for 
the number of names listed in the 900-page directory. 

12. Estimate the fraction of printed material in this book devoted to 
exercises, calculate a confidence interval, and explain your methods. 
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THE BIVARIATE 
NORMAL 
DISTRIBUTION 


19.1 Great thimjs are done when men and mountains meet ; 
this is not done hy jostling in the street. 


William Blake 


Look at the contour map of Stone Mountain, Georgia (Figure 19.1). It 
is not difficult to envision a smoothly rounded surface and a steep slope 
on the north side of the mountain. Although not as informative as a 
three-dimensional model, a contour map helps us to visualize the surface 
of the earth. Thinking of the earth’s surface with maps thereof provides a 
useful way of describing bivariate data, data that arise from populations 
where each member exhibits two variables. 


19.2 Bivariate frequency distributions arc useful in describing bivariate 
populations. Table 19.1 is one of many bivariate frequency distributions 
given by Pearson and Lee [2]. Although the entries in the table arc 
basically frequencies, they require some explanation because of frac¬ 
tional values. The data were originally recorded to the nearest inch 
intervals. When an individual fell on the boundary of a cell, its frequency 
was split between the cells. To illustrate the idea, consider the three 
father-son pairs in Table 19.2. The frequency of one for the father-son 
pair (68.5, 76.5) is split by giving 0.25 frequency to the four corner 
cells, producing Table 19.3. These fractional frequencies are. in fact, 
the values recorded in Table 19.1. 
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Figure 19.1 Stone Mountain. (Reproduced from Figure V-8, Atlas of 
Landforms by J. L. Scovel et al. Copyright © 1966 by John Wiley & Sons, Inc. 
Also reprinted from the U S. Geological Survey. Reprinted by permission from 
John Wiley & Sons, Inc.) 


19.3 Imagine erecting at the center of each cell in Table 19.1 a vertical line 
proportional to the frequency of that cell. If the tops of the vertical lines 
are connected, we have a surface like the surface of a mountain. The 
beautiful drawing in Figure 19.2 is given by Yule and Kendall [5]. 


19.4 Consider describing a population as we have just done, not for 1078 
father-son pairs but for all of the millions of father-son pairs in the 
United States. Furthermore, imagine that instead of tabulating heights 
to the nearest inch, we tabulate to the nearest one-tenth of an inch, etc. 
For such bivariate frequency distributions, we construct three- 
dimensional pictures like that in Figure 19.2. As the measurements 
become more precise, the surface depicted will become smoother, and we 
can visualize the limiting surface to be like that in Figure 19.3. 
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Table 19.1 Correlation between (1) Stature of Father and (2) Stature 
of Son; 1 or 2 Sons only of each Father. Measurements in inches._ 
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Source. Reproduced from Biometrika, 2, Karl Pearson and Alice Lee, 
On the laws of natural inheritance in man. Copyright © 1903 by the 
Biometrika Trustees. Also reproduced from Table 11.3, An Introduction 
to the Theory of Statistics, 11th edition, by G. U. Yule and M. G. Kendall. 
Copyright © 1937 by Charles Griffin & Company, Ltd. Reprinted by 
permission from the Biometrika Trustees and Charles Griffin & 
Company, Ltd. 



Table 19.2 

Frequencies 



Father 

Son 

67.5-68.5 

68.5 68.5-69.5 

75.5-76.5 

76.5 

1 

1 

76.5-77.5 

1 



Table 19.3 

Splitting Frequencies 

Father 

Son 

67.&-68.5 

68.5-69.5 

75.5-76.5 

1.25 

0.25 

76.5-77.5 

1.25 

0.25 
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Figure 19.2 Frequency-Surface for Father-Son Data. (Reproduced from 
Figure 11.3, Introduction to the Theory of Statistics, 11 th edition, by G. U. 
Yule and M. G. Kendall. Copyright © 1937 by Charles Griffin & Company Ltd. 
Reprinted by permission from Charles Griffin & Company Ltd.) 

Many bivariate data sets are well described by surfaces like that in 
Figure 19.3 and are said to be bivariate normal. Of course, we may prefer 
to describe such surfaces by contour maps. These maps usually will look 
like Figure 19.4, which consists of concentric ellipses. 

We will not present the mathematical equation for the bivariate 
normal. For the present we want to point out the following properties. 

1. The bivariate normal distribution surface is a bell-shaped surface 
that is symmetric about the means of both variables. 

2. If all of the frequencies are shoved to one axis, a univariate normal 
curve is obtained. For example, in the father-son data the stature of 
fathers only is obtained by ignoring the height of the sons and 
shoving all frequencies to the margin. A normal distribution for 
height of fathers is obtained. 

3. The cross section along any plane perpendicular to the axis for either 
variable is also a normal bell-shaped curve. 
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Figure 19.3 Limiting Form of the Frequency Surface. (Reproduced from 
Figure 11.1, Introduction to the Theory of Statistics. 11th edition, by G. U. 
Yule and M. G. Kendall. Copyright © 1937 by Charles Griffin & Company, 
Ltd. Reprinted by permission from Charles Griffin & Company Ltd.) 



Figure 19.4 Contour Map—Bivariate Normal. 
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Walker [4] credits Adrain as being the first writer to mention a 
bivariate normal distribution as early as 1808. Walker also cites Laplace 
(1810), Plana (1812), Gauss (1823), and Bravais (1846) with arriving at an 
equation for what we now call the bivariate normal distribution. 
Pearson [1] credits Galton (1885) as being the first to think of the 
bivariate normal surface as describing observational variables. 
However, Schols (1875) discussed the bivariate normal as describing 
horizontal and vertical errors in artillery fire. Gabon’s work is of 
enormous importance because it represents a major beginning in the use 
of the bivariate normal in statistics. 


SUMMARY. The use of contour maps to describe mountainous land 
surfaces provides a useful analogy for thinking about the distribution of 
bivariate data, data obtained by observing two variables for each 
member of the population. Classes, class intervals, and class marks are 
decided on for each variable. Bivariate frequency distributions are formed 
by recording the frequency for each pair of classes. 

A three-dimensional graph of the bivariate frequency distribution can 
be formed by plotting the frequency for each two-way class as the 
ordinate value. Then the population surface is easily visualized. 

The bivariate normal distribution has a bell-shaped surface. 
Furthermore, if the frequencies are considered on either margin, the 
marginal distribution is a univariate normal. The conditional distri¬ 
bution along my cross-sectional parallel to either axis is also normal. 
The contours of the bivariate normal distribution are concentric ellipses. 

The origin of the bivariate normal is somewhat obscure, but it was 
known at least as early as 1808. 
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EXERCISES 

1. A much better appreciation for figures such as Figure 19.2 usually 
results when one tries to construct a similar figure. Construct a 
frequency surface for the following small set of bivariate data on 
automobile engines. 


Net Horsepower 

Net Torque 

10&-108 108-111 111 

-114 

195-200 

5 

2 

0 

200-205 

8 

7 

2 

205-210 

4 

10 

6 

210-215 

0 

5 

9 


2. Repeat the Pearson and Lee inquiry. Try to get the heights on 100 
father-son pairs or 100 mother-daughter pairs. Construct a bivariate 
frequency distribution such as that in Table 19.1 and a frequency 
surface such as that in Figure 19.2. 

3. Using freehand drawing methods, draw contour lines on Table 19.1 
for frequencies 30, 25, 20, 15, and 10. Are the contour lines roughly 
elliptical in shape? 

4. A university department is administering student evaluation of 
instructors and keeps a stock of No. 2 lead pencils. A statistics student 
notices that the shorter, more used pencils also have shorter erasers. 
Measurement of 5(X) pencils gives the following data. 


PaopH I AnntK 

Eraser Length (^ inch) 


(inch) 

0-1 

1-2 

2-3 

3-4 

2-3 

16 

9 

5 

5 

3^ 

12 

42 

13 

8 

4-5 

36 

88 

37 

10 

5-6 

7 

30 

92 

16 

6-7 

6 

8 

40 

20 


Construct a graph of this frequency distribution. Draw contours on 
this frequency distribution. 
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5. A mail-order house analyzed the orders received from a large number 
of mail customers. For customers ordering both shirts and shoes the 
shirt size (sleeve length) and shoe size (length) were recorded. 


Shirt Size 


Shoe Size 

32 

33 

34 

35 

36 

38 

00 

I 

42 

20 

1 

0 

0 

0 

8’-9 

40 

86 

42 

17 

0 

0 

9’-10 

62 

101 

68 

50 

23 

0 

10’-l 1 

15 

76 

125 

53 

43 

1 

11 ;-12 

0 

0 

70 

108 

48 

8 

12J-13 

0 

0 

5 

69 

86 

30 

13^4 

0 

0 

0 

10 

20 

32 


Graph the frequency distribution. Sketch contours for the frequency 
distribution. 

6. The manager of a grocery supermarket is able to study all sorts of 
relationships with his new accounting system. A study of the total 
grocery bill versus the amount for meat gives the following 
distribution. 




Total Bill (dollars) 


IVIcal DIM — 

(dollars) 

0-20 

20-40 

40-60 

60-80 

80-100 

0-10 

50 

42 

37 

12 

6 

10-20 

25 

35 

52 

50 

60 

20-30 

0 

11 

40 

56 

53 

30-40 

0 

1 

22 

32 

39 


Graph the frequency distribution. 
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CORRELATION 


20.1 At length, one morning, while waiting at a roadside station near 

Ramsgate for a train, and poring over the diagram in my notebook, it 
struck me that the lines of equal frequency ran in concentric ellipses. The 
cases were too few for certainty, hut my eye, being accustomed to such 
things, satisfied me that I was approaching the solution.* 

Gallon, in his Memories of .My Life [ 1, p. 302], gives this account of his 
being led to a family of concentric eUipses like those we encountered in 
the previous lecture. Gallon goes on to tell about consulting J. D. 
Hamilton Dickson, tutor at St. Peter’s College, Cambridge, to obtain the 
mathematical equation for the bivariate normal that would give rise to 
such concentric ellipses. 


20.2 It IS necessar>’, if we are to have anything of the understanding of Gallon 

(nearly 100 years ago), to have a rudimentar>’ knowledge of straight lines 
and ellipses. 

The equation of a straight line (any straight line) has the form 


Y = a + bX 

where a and b are constants. The constants a and b have simple 
geomeincal interpretations. The constant a is the T-intercept, the 


I I permission of the publishers Methuen 

Ule b> Franas Gallon. 


^ from .Memories of A/ v 
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distance from the origin at which the line intercepts the V-axis. The 
constant h is the slope of the line where 

vertical difference between two points 

Slope = —:-;- - - 

horizontal difference between two points 

In Figure 20.1 four straight lines are shown. Students should verify 
that the slopes and intercepts are as given. 

In Lecture 19 we noted that the contours of the father-son frequency 
data were concentric ellipses. It is extremely helpful in understanding the 
correlation coefficient to have a little knowledge of the equation of an 
ellipse. Most students realize that an ellipse is an oval, egg-shaped figure 
but that not just any oval will do. One interesting way to draw an ellipse 
is to attach two ends of a piece of string to a piece of paper (with some 
slack in the string), to take up the slack with a pencil, and to move the 
pencil while keeping the string tight. This is shown in Figure 20.2. 

The two lines of symmetry (more commonly the parts of these lines 
within the curve) are called the major and minor axes. The major axis is 
the longer of the two segments; the minor axis is the shorter. 


) 
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Figure 20.2 Construction of an Ellipse. 

The general equation of any ellipse centered at the origin is given by 

Ax^ + 2Bxy + Cy^ = D 

with 

AC-B^>0 

Specific ellipses are generated by assigning different values to A, B, C, 
and D. For example, if we let >1 = 1, B = 0, C = 1, and D = 4, we have a 
circle centered at the origin with radius 2. 

20.3 If we now follow the step taken by Gallon of standardizing our variables 

we can get a very simple measure of association or correlation between 
the standardized variables by examining the elliptical contours of the 
bivariate normal. Let the standardized variables be and Z 2 . Then these 
contours are described by the elliptical equation 

z\ - 2 pz,z 2 + z| = D 

where 

1 — > 0 

By varying the value for D, we get a family of ellipses concentric to the 
one in Figure 20.3. 

If p is positive the major axis is the OB direction. If p is negative the 
major axis lies in the OA direction. If p = 0, the contours are circles. Thus 
p (rho), called the correlation coefficient, provides a measure of 
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=2 



Figure 20.3 Typical Contour of Bivariate Normal: Standardized Variables. 






Figure 20.4 Interpretation of p. 
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association between the standardized variables z, and Z 2 (and, therefore, 
between the unstandardized variables X and Y). 

If p > 0, X will tend to be large when Y is large. Ifp < 0, X will tend to 
be large when Y is small. If p is close to + 1 or - 1, the relationship is very 
strong. The situation is depicted in Figure 20.4. 


20.4 We have been discussing the correlation coefficient p as a parameter 
appearing in the equation for the bivariate distribution. The effect of p on 
the shape of the elliptical contours gives a way of thinking about the 
relationship between two normal variables X and Y. There is an 
alternative way of describing p that is equivalent to the one already given 
for bivariate normal populations and that is equally valid for nonnormal 
populations. Pearson [2] is generally credited with having extended the 
use of correlation to nonnormal populations, and this formulation is 
often called the Pearson product-moment correlation. 

Consider a population in which a large X is usually associated with a 
large Y and a small X with a small Y. In terms of standardized variables 
z, and Z 2 , the scatter diagram would appear as in Figure 20.5. 


A 



Figure 20.5 Product-Moment Correlation. 
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In quadrants I and III, the product z,Z 2 would be positive;it would be 
negative in quadrants II and IV. Because there are more points in 
quadrants I and III than in quadrants II and IV, the average (or 
expected) product would be positive. If the scatter points tended to have 
a negative slope the average value of z,Z 2 would be negative. It seems 
logical then to take this average product as a measure of association. It is 
shown in books on mathematical statistics that if we have a bivariate 
normal, this average product is identically equal to p as already defined. 
That is, 

p = population average of Z 1 Z 2 
= Ave(z,Z2) 

In terms of the original variables 

Ave(A'-A/J(y-/i,) 

p=^ —- 

OjGy 


20.5 Given a sample, the most obvious way of estimating p is to calculate the 
sample equivalent. Denoting our estimate by r, let 

( \ln)i:(X,-Xm-Y) 
y(l/n)I(X, - X)^(l/n)Z(i;- - 


Example. __ 

A- Y x-x y-Y ix-X){y-?){x-xy (Y-)!' 


2 

1 

-0.4 

-2.4 

0.96 

3 

6 

0.6 

2.6 

1.56 

1 

0 

-1.4 

-3.4 

4.76 

4 

8 

1.6 

4.6 

7.36 

2 

2 

-0.4 

-1.4 

0.56 


0.16 5.76 

0.36 6.76 

1.96 11-56 

2.56 21.16 

0.16 1.96 


Total 12 17 0.0 


0.0 15.20 5.20 47.20 


A' = 2.4 Y = 3.4 

^ - = 0.9702 

y(5.20) (47.20) 
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20.6 Assuming that X and Y are from a bivariate normal, it is very easy to test 
the null hypothesis 

Ho:p = 0 

versus the alternative hypothesis 

//lip #0 

The sample correlation coefficient is simply calculated and compared 
with the value given in Table A. 6 . 

Example. For the example in Section 20.5, r = 0.9702. From Table A. 6 , 
the tabled value for n = 5 is 0.878. Since the calculated value exceeds the 
tabled value, we conclude (at the 0.05 level) that p 9 ^ 0. 


SUMMARY. In searching for an explanation of association between 
two variables, Galton was led to the correlation coefficient. The 
correlation coefficient, p, has a very natural and simple explanation as a 
parameter of the equation generating the elliptical contours of the 
bivariate normal distribution. If and X^ are closely related and tend 
to increase (or decrease) together, it seems reasonable that the contours 
are long, thin ellipses with a positive slope. The equation gives p a value 
close to 4-1. If 1 and X 2 are closely related and X^ tends to increase as 
X 2 decreases, the slope is negative and p is close to — 1 . 

Pearson extended the use of correlation to nonnormal populations; 
for this reason the correlation coefficient is often called the Pearson 
product-moment correlation. 

The sample correlation coefficient, r, is quite variable and inferences 
are suspect with small sample sizes. Testing of the hypothesis that p = 0 
is very easy mechanically but requires a moderately large sample size to 
be of value. 
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EXERCISES 

1. Graph and give equations for the following straight lines. 

a. y-intercept = 3 slope = 2. 

b. y-intercept =—2 slope = 3. 

c. y-intercept = -3 slope = -4. 

d. y-intercept = -4 slope =-2. 

2. Sketch the ellipses for standardized variables z, and Zj given by the 
equation 


z\ - Ipz^z^ -I- z^ = 1 
for p = —0.5, —0.2, 0.2, and 0.5. 

3. Television audiences in two neighboring cities noted that the weather 
forecasts for the two cities were often markedly different. However, 
their experience indicated that the weather in one city was re¬ 
markably similar to that of the other city. The following are midday 
temperature readings (°F) in the two cities over a 24-day period. 


City X 

City y 

City X 

City y 

80.2 

83.5 

97.4 

99.4 

83.1 

88.0 

97.2 

101.2 

82.3 

85.7 

91.3 

94.8 

87.3 

90.4 

90.9 

93.1 

89.1 

93.2 

91.4 

94.7 

88.7 

89.6 

92.3 

96.0 

90.2 

94.5 

90.9 

94.2 

95.3 

98.7 

89.5 

91.5 

98.4 

101.6 

86.4 

89.8 

98.7 

103.6 

84.7 

87.1 

98.6 

102.3 

85.8 

80.8 

98.9 

99.2 

84.0 

85.6 


a. Calculate the correlation coefficient r. 

b. Using Table A.6, test the hypothesis (at the 0.05 level) that p = 0. 

c. Plot the temperature readings on a scattergram. 

d. How would you describe the association between temperatures in 

the two cities? 
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4. Satellite data are used to estimate the acreage given to different kinds 
of land use. The percent of land planted to wheat estimated at 12 sites 
and estimates obtained from ground records were as follows. 


Sites 

Satellite 

Ground 

1 

14 

12 

2 

11 

24 

3 

4 

0 

4 

45 

36 

5 

19 

18 

6 

8 

5 

7 

23 

16 

8 

24 

30 

9 

23 

28 

10 

36 

22 

11 

35 

24 

12 

23 

29 


a. Calculate r. 

b. Test Ho: p = 0 versus H^ip 

5. An interior decorator shops for brass candlesticks. It quickly becomes 
obvious that there is a relationship between the price and the reported 
age. We are given the following data for X = price in dollars and 
Y = age in years. 


X 

y 

85 

5 

125 

150 

165 

170 

140 

150 

120 

125 


Calculate the sample correlation coefficient, r, and test Hq : p = 0. Use 
a = .05. 

6 . In a crowded classroom building the elevators receive heavy service. 
During the break between classes, there are people waiting to go up. 
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X, and people waiting to go down, Y. The building engineer collects 
some data on the numbers of people waiting. The data are as follows. 


X 

Y 

32 

28 

16 

18 

48 

36 

52 

35 

36 

28 

38 

39 

43 

40 

52 

34 


Calculate the sample correlation coefficient, r, and test the hypothesis 
that p = 0. Use a = .05. Are the results reasonable? 

7. A salary survey in a particular profession asks for annual salary and 
age (among other variables). The data reported are as follows. 


Age (X) 

Salary (y) 
(thousands 
of dollars) 

30 

22 

41 

28 

52 

38 

38 

40 

48 

34 

56 

35 

44 

32 


Calculate the sample correlation coefficient, r, and test HqIp — 0. Use 
a = .05. Are salary and age related? 
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GALTON'S ILLUSTRATION OF CORRELATION. A pictorial represen¬ 
tation of the ideas of regression and correlation. Courtesy of the Museum of 
Natural History. 
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21.1 Forrest [1] gives a good account of the wide range of Galton’s interests 
and activities. When he was in his forties, Galton turned his energies to 
the study of human heredity, in the process making contributions not 
only to genetics but also to statistical method and theory. 

In his Memories [3], Galton describes a problem that puzzled him. 

# 

How is if possible for a population to remain alike in its feat ares, as a whole, 
(lurimj many successive generations, if the averaffe produce of each couple 
resemble their parents'! Iheir children are not (dike, but vary: therefore 
some would be taller, some shorter than their average height: so among the 
issue of a gigantic couple there would he usually some children more 
gigantic still. Conversely as to very small couples. But from what I could 
thus far find, parents had issue less exceptional than them.selves.* 

In the 1880s data representing two generations of a human population 
were not available and, at the suggestion of Sir Joseph Hooker and 
Charles Darwin, Galton decided to experiment with sweet peas. 

21.2 In 1885 Galton selected a large number of pea seeds and sorted them into 
seven different weight groupings. He then persuaded friends living in 
various parts of England to plant 70 seeds, 10 from each of the weight 
groups, according to a set of minute instructions. The foliage of each 
crop was returned to Galton, providing him with detailed data on two 
generations of sweet peas. The details of the experiment were presented 
in 1877 [2]. Forrest gives the essential results of this experiment in Table 
21 . 1 . 

• Reprinted by permission of E. P. Dutton and the publishers, Methuen & Co.. Irom 
Memories of My Life by Francis Galton. 
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Table 21.1 Sweet Pea Experiment 


Diameter (0.01 inch) of Seed 


Parent 15 16 17 18 19 20 21 

Daughter 15.4 15.7 16.0 16.3 16.6 17.0 17.3 


Source. From Francis Gallon: The Life and Work of a Victorian Genius by 
D. W. Forrest (Taplinger, 1974). Copyright © 1974 by Paul Elek (Scientific 
Books) Ltd. Reprinted by permission. 

Note that the daughters of the dwarf peas are less dwarfish than the 
parents and the daughters of the giant peas are smaller than the parents. 
This phenomenon Galton called reversion [2]. He says that “Reversion 
is the tendency of the ideal mean filial type to depart from the parental 
type, reverting to what may roughly and perhaps fairly be described as 
the average ancestral type.” Galton had previously used the word 
regression and had then returned to the use of the term regression. 

21.3 If we express variables in standard units X' and Y' (obtained by 
subtracting the mean and dividing by the standard deviation), the 
phenomenon Galton observed can be described by Figure 21.1. Instead 


V"(offspring) 





A"(parent) 


Figure 21.1 Gallon's Correlation Diagram. 
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of data points X' and Y' being distributed about the 45-degree line AB, 
they tended to be distributed about the line CD. Finally, by inspection of 
his crude data and with mathematical assistance from friends, he came to 
see that a feasible explanation is given by the elliptical contours of the 

bivariate normal. 

In Figure 21.1 we have shown a typical elliptical contour of a bivariate 
normal population expressed in terms of standardized variables. The 
major axis lies along the 45-degree line AB. Galton erected vertical lines 
tangent to the ellipse and connected the points C and D. Similarly, 
horizontal tangents produced the line EF. The logic is as follows. For 
any specified X\ the conditional distribution of Y' is normal with the 
mean as the most likely value. Therefore, for a given X\ the point of 
tangency gives the mean of the conditional distribution. Because the 
elliptical contours are concentric, all of the points of tangency lie on a 
straight line. Thus we have the remarkable fact that the means of the 
conditional distributions of Y' lie on the line CD. Similarly, the means of 
the conditional distribution of X' lie on the line EF. 

The significance of the foregoing comments to the user of statistics is 
that there are three possible population regression lines for data arising 
from a bivariate normal distribution. If the {X'Y') pairs were selected at 
random from the population we probably would be interested in the AB 
line. If the X' values were specified and the corresponding Y' values 
observed (as with the parent-offspring data) we would be interested in 
the CD line, called the regression line of Y on X. If, on the other hand, the 
values of Y’ were specified and the X' values observed, we would be 
interested in the EF line, the regression line of X on Y. 

21.4 In standardized units the equation for the mean value of T as a function 
of X is simply 

r = pX' 

By substitution, we can obtain the equation in terms of unstandardized 
variables. 

^ -F, X-p^ 

-= P- 

Oy Ox 

Ox 

Oy O, 

= Fy-p — Fx + p — X 

Ox O^ 
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Using simpler notation, we may prefer to write 

Y = a + pX 

I 

It is helpful to visualize not only the regression but all of the conditional 
distributions, as in Figure 21.2. Thus for each X there is a normal 
population of Y values with mean a + PX and variance The means lie 
on the straight line 7 = a + ^X, and the variance is the same for every 
population. In fact, — p^). Thus the conditional variance of Y 

is smaller than the unconditional variance of Y by an amount and 
by the fraction p^. Considerable significance is often attached to this fact 
(particularly in social science statistics), and p^ is said to be the fraction of 
variation explained by the regression. 


Y 



Figure 21.2 Bivariate Normal Regression. 


21.5 Although the bivariate normal is of historical importance in the study of 
regression, it no longer plays a central role. Quite frequently it is not 
necessary to suppose that there is a bivariate normal population. It may 
be reasonable to suppose that for each Xy there is a normal population of 
Ts, that the means lie on a straight line, and that the variance is the same 

for all Y populations. 


POPULATION REGRESSION LINE 


SUMMARY. The puzzling association of the word regression with 
drawing straight lines through clusters of points can be explained by- 
studying Galton’s original use of the word. He observed that in many 
instances the characteristics of the parents tended to be passed on to the 
children, but that there was a reversion or regression toward the overall 
population mean. The word has gradually become associated with 
straight line data of all types. 

Given data from a bivariate normal distribution, there are three 
regression lines of interest: the major axis of the ellipse, the line joining 
tangent points of horizontal tangents, and the line joining tangent points 
of vertical tangents. These lines have the following statistical interpre¬ 
tations: the regression line for random selection of {X, Y) pairs, the 
regression line for Xs selected at random for specified Ys, and the 
regression line for Ys selected at random for specified A's. 

The last of these interpretations receives considerable emphasis and 
can be restated as follows. For each X, there is a population of Y values 
with mean a -l- PX and constant variance 
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ADRIEN MARIE LEGENDRE. 1752-1833. Courtesy Columbia 
University Library, D. E. Smith Collection. 
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THEORY 
OF LEAST 
SQUARES 


22.1 For 200 years the method of least squares has been known to scientists 
and has become the primary method by which curves and equations are 
fitted to data. Much research in recent years has been directed toward 
finding better methods of fitting, but least squares remains dominant. 
Perhaps this is because the idea, once acquired, seems intuitive. 

In some of the simplest problems the method of least squares seems to 
be nothing more than choosing the most direct solution. Years ago I 
heard the following illustration given by Professor George Polya in a 
lecture at Oklahoma State University. If you airlifted a cow to the middle 
of a lake and deposited her there, what route would she take in 
swimming to shore? 

I have not studied such reactions from cows, but it seems reasonable 
that most such “dumb” animals would swim to point A in preference to 
point Figure 22.1), for instance. Furthermore, it seems reasonable that 
the destination chosen on the shore would be near the point closest to 
where the swim began [map coordinates (.v,, Vi)]. In minimizing the 
distance to the shore, suppose that our cow knew algebra and that she 
knew the distance from to (X 2 ,y 2 ) fo 


^ “ ■'■'2)^ + (>’i - Xi)' 

To minimize the distance, she simply chooses .v,, minimize D. 

Clearly, if is minimized, so is D, so the square root need not be 
considered and all that is required is to choose (.V 2 ,y 2 ) so as to minimize 

(X, - xs)^ + (y, - ^2)^ 

It seems very natural to call this the method of least squares (actually the 
minimum of a sum of squares). 
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Figure 22.1 Method of Least Squares. 

22.2 In order for the concept to be useful it is necessary to extend the number 
of dimensions. The geometric interpretation works quite nicely for three 
dimensions because the distance between two points (Xi,yi,Zi) and 
in three dimensions is 

n/(x, - + (y, - + (z, - 

and least squares simply minimizes the quantity under the square root 
sign. For more than three dimensions we increase the number of terms 
under the radical, although the geometric interpretation is now purely 
algebraic, aided by pictures and graphs in two and three dimensions. 


22.3 Gauss is widely credited for devising the method of least squares. 
However, it seems to have been discovered at about the same time by 
Gauss, Laplace, Legendre, and others.|According to Plackett [2], it was 
first named and published by Legendre in 1806 in Nouvelles methods 
pour la determination des orbites des cometes {new methods for the 
determination of comet orhits).(Abbe [1] believes that the method was 
invented by Gauss as early as 1795 but was first published by Legendre in 
1806.^Abbe goes on to argue that Robert Adrain discovered the method 
independently and published it in the Analyst in 1808 in response to the 
study of a prize problem in that journal. 


22.4 The basic concept of least squares can (and should) be perceived apart 
from the methods used to achieve the minimum value. The specific 
minimization method used may be quite complex, making use of 
algorithms from nonlinear programming, or relatively simple, using 
first-year calculus. Simpler still, minimization may be achieved by using 
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only algebra or by trial and error without any mathematics. It is not the 
purpose of this lecture to give a survey of minimization methods, but 
some understanding of the process is desirable. Therefore we will give 
some examples of minimization by elementary means. 


22.5 Consider one of the simplest problems in statistics. Given n observations 
from a population with mean /z, find the least squares estimate of /z. That 
is, find the value /z that minimizes 



n 


- /‘O’ + (^2 - + (>-'3 - /O" + ••■ + (Vn - /0‘ = Z 


1= I 


By elementary algebra, we rewrite this quantity as follows. 

z (-V. - fiy = z (X. - X + X - /z)^ 

i = 1 i = I 

= X (x, - .x)^ + 2(x - /z) X (x. - x) + /i(x - /z)^ 

1=1 1=1 

n 

Now, because ^ (x, — x) = 0, we have 

i= 1 

Z (^1 - = Z (^i - + »(x - A)^ 

1=1 i=i 

On the right side of the equation the first term does not involve /z, and the 
second term is clearly smallest when ft = x. Therefore the least squares 
estimate of /z is given by ft = .x. The method of least squares thus gives us 
the intuitively desirable prescription of estimating the population mean 
by the sample mean. Furthermore, this result is obtained by elementary 
algebra. 


22.6 Let us consider another example of minimizing a sum of squares by 
elementary means. In yacht racing, the effective time (used for deciding 
the finish) is obtained by multiplying the elapsed time by a rating factor 
for the boat. If we let 

y = effective time 
and 


X = elapsed time 
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then 


Y = fix 

Let us suppose that a particular boat has raced five times and that its 
elapsed times as well as effective times are known. They are as follows. 


X 

Y 

2.1 

2.3 

2.2 

2.5 

3.4 

3.6 

6.4 

7.1 

5.2 

5.8 


We want to use the data to get a least squares estimate of p. That is, we 
want to minimize 

S = X iY-px,f 

i=l 

with respect to fi, the rating factor of the boat. One simple way to proceed 
is by trial and error. 

Inspection of the data suggests that fi is about 1.1. We can see from 
Table 22.1 that our initial trial of fi = 1.1 is very close to the value that 
would minimize S. Because three of the fiX values are too small and two 
are too large, we might try a slightly larger value for fi. In this way 
we can, by trial and error, find the least squares estimate to any number 
’ of decimal places required. 


Table 22.1 Trial Calculations: fi = 1.1 


X Y I.IX 


2.1 

2.3 

2.31 

2.2 

2.6 

2.42 

3.4 

3.6 

3.74 

6.4 

7.1 

7.04 

5.2 

5.8 

5.72 

Total 




Y-\.\X (y-l.lA) 


-0.01 


0.08 


-0.14 


0.06 

0.0036 

0.08 

0.0064 


0.0361 
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22.7 Although calculus can be used to obtain a least squares formula in the 
example just considered, it is not necessary. We can, in fact, obtain it by 
elementary algebra as follows. 

S = = I( V - 

= -2pZXiYi + 


Let us now complete the square in //, treating /i as the variable and the 
quantities in X and Y as coefficients. 


S = ZXi^ ^p 


= zxMp- 


ZXJ, , 




+ ZYi^ - 


{T.XJd 

ZXi^ 


ZXi 


+ ZYt^- 


ZXi^ 


It is clear that the first term in the last expression, and therefore S, is 
minimized by taking P = ZXJJZXi^ and that the minimum value is 






Therefore the least squares estimate is given by 



In Table 22.2 the calculations for obtaining fi are given. We can see 
that our trial and error result was, in fact, quite close. 


Table 22.2 Least Squares Calculation 


X 

Y 

Y~ 

XY 

2.1 

2.3 

4.41 

4.83 

2.2 

2.5 

4.84 

5.50 

3.4 

3.6 

11.56 

12.24 

6.4 

7.1 

40.96 

45.44 

5.2 

5.8 

27.04 

30.16 

Total 

. 99.17 

P = 

^ 88.81 

88.81 

= 1.1054 

98.17 
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22.8 In summary, the method of least squares minimizes an appropriate sum 
of squares. Many methods can be used to achieve the minimization. 


SUMMARY. In the late eighteenth and early nineteenth centuries the 
method of least squares was popularized by Gauss, Legendre, Laplace, 
and others. Despite efforts to find better methods, it still remains as a 
fundamental method of data analysis. 

The intutive idea is that of the shortest distance from a fixed point to a 
region. Because the distance formula from analytic geometry takes the 
square root of a sum of squares, the shortest distance is found by 
minimizing a sum of squares. Quite naturally, this method is called the 
method of least squares. 

The least squares estimate of ^ is given by minimizing L(.v — /i)^ with 
respect fo /3. By several different methods, it can be seen that this is 
achieved by // = x. Thus the sample mean is the least squares estimate of 
the population mean. 

Given the simple regression model Y = [iX, the least squares estimate 
of (i requires minimizing I.(Y - pxf. This gives p = IXYfLX^. 
Usually X is plotted on the horizontal axis and Y on the vertical axis so 
that the sum of squares of vertical deviations is minimized. 
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EXERCISES 

1. A good understanding of least squares can be developed by empirical 
studies. Consider the problem in Section 22.5. Given the data, 8, 10, 
12, 9, 21, evaluate S = I(.v,- - for/^ = 10, 11, 12, 13, 14. Graph 
values of S obtained versus /i. What value of /r would appear to 
minimize S? What value is indicated by Section 22.5? 

2. Evaluate S of Section 22.6 for values of/I = 1.00,1.15,1.10,1.15,1.25. 
Graph S versus (i. What value of /f seems to minimize S? 
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3. Plot the following data on engineering graph paper, X on the 
horizontal axis, Y on the vertical. 


X 

y 

10 

7 

9 

9 

8 

10 

10 

9 

6 

5 

5 

6 

5 

4 

3 

3 

3 

4 

1 

4 


Using a straight edge, draw a line that looks reasonable through the 
data. Measure the vertical deviations from the line you have drawn 
and calculate the sum of squares of deviations. Repeat the process 
until you feel confident that your line is close to the line that 
minimizes the sum of squares of vertical deviations. 

4. Make a three-dimensional plot of the following data, using the 
vertical axis for Y. 


Xx 

X2 

Y 

1 

2 

21 

1 

3 

25 

1 

4 

29 

2 

2 

24 

2 

3 

28 

2 

4 

32 


Draw a plane that seems reasonable, measure the vertical deviations 
from the plane, and calculate the sum of squares of deviations. 
Proceeding in this way, find the plane by trial and error that 
minimizes the sum of squares of deviations. 



(*i se 
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ESTIMATION OF 
REGRESSION 
PARAMETERS 


23.1 In the preceding lectures we have developed the idea of a population 
regression line and have introduced the method of least squares as a way 
of estimating the population parameters. In this lecture we illustrate 
many of the working formulas with a numerical example. 

23.2 I have frequently recorded the first and last examination scores for 
students in my classes. Often these sets of scores appear to follow the 
simple linear regression model. The following data are extracted from 
the examination scores of a large class. 


First Exam (X) 

Last Exam (V) 

63 

68 

65 

75 

72 

70 

73 

76 

80 

81 

85 

78 

86 

89 

93 

84 


We will assume that 


y = /^o + Pi^ + ^ 

and that the errors (e) are normal and independently distributed with 
mean zero and variance <t^. We will now give formulas and illustrate the 
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computations for this example for (1) estimates of Pq, and a\ 
(2) variance of Pq and (3) confidence intervals for Pq and and 
(4) tests of significance for Pq and Pi. 


23.3 In Table 23.1 we give the basic calculations for all that follows. Of 
course, many calculators are now available that will give the column 
totals of Table 23.1. With such a calculator there is no need to write the 
individual entries in the table. From the column totals, we obtain the 
next set of basic quantities. 

X = 617/8 = 77.125 

? = 621/8 = 77.625 

m - X)^ = IX^ - {ZXf/n 

= 48,377 - 47,586.125 

= 790.875 

Z(Y- y)2 = iy2-(iy)V/i 

= 48,547 - 48,205.125 
= 341.875 

I(X - Jf)(y - F) = ixy - {I.X)(I.Y)/n 

= 48,323 - 47,894.625 
= 428.375 


Table 23.1 Linear Regression Calculations 


X 

Y 

A'' 

y' 

A’y 

63 

68 

3.869 

4,624 

4,284 

65 

75 

4,225 

5,625 

4,875 

72 

70 

5,184 

4,900 

5,040 

73 

76 

5,329 

5,776 

5.548 

80 

81 

6,400 

6,561 

6.480 

85 

78 

7,225 

6,084 

6,630 

86 

89 

7,396 

7,921 

7,654 

93 

84 

8,649 

7,056 

7,812 

617 

621 

48,377 

48.547 

48,323 


Total 
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23.4 The least squares estimates of and /^, are found (using one of the 
minimization methods in Lecture 22) to be given by the following 
formulas. 

. 2:(Ar-.y)(y-?) 

1(X - 

For our examination score example, we use the quantities obtained in 
Section 23.3 to obtain 



428.375 

790.875 


= 0.5416 


77.625 -(0.5416)(77.125) 
35.8541 


So the least squares line is given by the equation 

y= 35.8541 + 0.54162^ 

This equation is graphed in Figure 23.1 along with the original data. 


> 
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It may not be necessary to follow the calculations through in detail in 
order to understand what the line in Figure 23.1 is. Many calculators are 
now available, as well as computer programs, that give and 
directly. I feel that students should follow through the details of the 
calculations a few times in their education. The calculations should not 
be allowed to obscure the fundamental nature of the least squares line; 
i.e., it is the line for which the sum of squares of vertical deviations from 
the line is minimum. 


23.5 What about these deviations? They are calculated by calculating the Y 
value on the line, denoted by T (T hat), and subtracting from the 
corresponding actual Y value. These calculations are given in Table 23.2. 
As a computational check, the total for the Ys must equal the total for 
the Ys and the deviations must sum to zero. Slight discrepancies may 
arise because of rounding error. In the last column of Table 23.2 the 
deviations are squared and summed. Actually, this total is more easily 
obtained from the formula 


Zd^ = I.(Y-?)^ 


UX - X)^ 


This gives 


ld^ = 109.8470 


as obtained from the table (except for rounding error). 


Table 23.2 Vertical Deviations 
A y Y=lio + fiiX d=Y-Y 


63 

68 

69.9749 

65 

75 

71.0581 

72 

70 

74.8493 

73 

76 

75.3909 

80 

81 

79.1821 

85 

78 

81.8901 

86 

89 

82.4317 

93 

84 

86.2229 


-1.9749 

3.9002 

3.9419 

15.5386 

-4.8493 

23.5157 

0.6091 

0.3710 

1.8179 

3.3048 

-3.8901 

15.1329 

6.5683 

43.1426 

-2.2229 

4.9413 


Total 


613 


621 


621.0000 


0.0000 109.8471 
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The estimate of commonly used in statistics is 

= Z iP/in - 2) 

= 109.8470/6 
= 18.3078 

The divisor of n - 2 deserves some comment. In general, the divisor for 
estimating <t~ is equal to n minus the number of parameters needed to 
specify the mean value. In this case /^o i specify the mean value of Y 
for a particular X, so the divisor is n — 2. 


23.6 The least squares estimates and /f, are calculated from the data and 
would not be constant in repeated sampling from the same population. 
Therefore we are interested in their variances. These variances are 
derived in theoretical statistics and are given by 


2 _ a-ZvY^ 

“ nZi A" 

-2 __ 

“ i.{x - xy 


Because is unknown, the variances of and /?, are unknown. But we 
have an estimate of <x^ and can, therefore, estimate the variances of Pq 
and fiy. Denote these estimated variances by and These variance 
estimates are given by 



nZiX - X)^ 


_ (18.3078)(48,377) 
(8)(790.875) 

= 139.9836 


with the estimated standard deviation of 

% = V 39.9836 


= 11.8315 
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and 


2 _ 18.3078 
^ “ 790.875 

= 0.0232 

with the estimated standard deviation of 

=0.1523 


23.7 The purpose of this lecture is to acquaint students with calculations, but 
some interpretation of the quantities calculated seems essential. No 
physical interpretation can be given to in this example. It is simply an 
estimate of the T-intercept. As far as the present example is concerned, 
the y-intercept is the population mean score on the final examination of 
all students scoring zero on the first examination. For most classes this 
will not be a meaningful concept. 

On the other hand, , gives an estimate of the change in final score per 
unit change in the first score. An increase of 10 on the first score would 
correspond to an increase of 5.416 on the final score. 

From Figure 23.1 it is apparent that there is considerable variation of 
data about the least squares line. This variation results in sizable 
estimates of the variances of po and Pi. With = 11.8315 we might well 
find 30 or 40 as acceptable values for Pq. Also, with = 0.1523 we 
should not attach great importance to the least squares estimate 
Pi = 0.5416 but should realize that the true value might be somewhat 
different. 

A more formal way to approach the question of reasonable values of 
Po and Pi is to use tests of significance. For the null hypothesis 

versus the alternative hypothesis 


Hi-.Po^K 



we calculate the test statistic 
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Using a t table with 6 = n — 2 degrees of freedom, we calculate the 
significance level 

SL = P(|r|>|r„,.|) 

For null hypotheses concerning //,, we perform analogous compu¬ 
tations. 

Suppose we have 


Then 


and 


Ho: =0.4 

Hi:/?, 9^0.4 






= 0.9297 


SL = 0.39 


Because this is a fairly large value, we find the hypothesized value of /?, in 
good agreement with the data. 


23.8 It will be apparent to the thoughtful student that /?, = 0.4 is not the only 
value in agreement with the data. A recommended procedure is to 
calculate the significance level, SL, for various hypothesized values of/?, 
and to graph SL versus hypothesized values. Values in agreement with 
(or in consonance with) the data can be obtained from the graph. 

Most textbooks present such intervals for only one significance 
level. For example, if we specify a significance level of 0.05, the interval 
is given by 


± to.o2S^Ai 

which, for the present example, is 

0.5416 ± (2.4469) (0.1523) 
or 

0.5416 ±0.3727 
or 


0.1689 to 0.9143 
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23.9 


Least squares regression lines are widely used, and observant students 
will find many examples. For example, Hume [2] gives the equa¬ 
tion obtained by Binford for the pipe stem data of Table 5.5 to be 
y = 1931.85 — 38.263^, where X is the hole diameter and Y is the date. 
This equation can be used for dating pipe stem fragments for which 
diameter measurements can be made. 

Although the data from which the equations were obtained arc not 
given, regression equations relating lengths of arm and leg bones to 
height of the person are given in the book called The Body [6]. These 
equations can be used by anthropologists for estimating what the living 
height of a person might have been from the measurement of one bone. 

Nigel [5] made use of a least squares line in studying the behaviour of 
the mink in searching for food. He found a linear relationship between 
the search duration and duration of subsequent pursuits. 

As a final example, we mention the article of Lave and Seskin [4], 
which discusses the use of regression in relating air pollution to the 
human mortality rate. 


SUMMARY. The computational details of estimation, calculating 
confidence intervals, and performing tests are illustrated for the 
parameters of the simple linear regression model 


y = /fo + fi,x -h c 


The least squares estimates of /^o and /f, are given by 

fiQ=Y- [i.X and [i, = I.(X - X)(Y - Y)/I.(X - Xf 


The estimate of is given by the sum of squares of vertical deviations, 
divided n — 2. Formulas are given for the estimated variance of fio 
and/],,.sj„and.sj,. 

Confidence intervals for /fo and /f, arc given by fio ± /o.o25'*/)o 
lU ± /o.o25Vi' respectively. 

Tests of hypothesized values for the intercept and slope are given by 
the i test, with i values given by 


fio - Ifo 



and 


/A - 



respectively. In both cases the degrees of freedom are n — 2. 

Examples arc given of the use of simple linear regression in archae 

ology, anthropology, animal behavior, and ecology. 
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EXERCISES 

1. The record times for the mile run from 1911 to 1975 are as follows. 


1911 

4:15.4 

1945 

4:01.4 

1913 

4:14.6 

1954 

3:59.4 

1915 

4:12.6 

1954 

3:58.0 

1923 

4:10.4 

1957 

3:57.2 

1931 

4:09.2 

1958 

3:54.5 

1933 

4:07.6 

1962 

3:54.4 

1934 

4:06.8 

1964 

3:54.1 

1937 

4:06.4 

1965 

3:53.6 

1942 

4:06.2 

1966 

3:51.3 

1942 

4:04.6 

1967 

3:51.1 

1943 

4:02.6 

1975 

3:51.0 

1944 

4:01.6 

1975 

3:49.4 


a. Convert the times to seconds. 

b. Plot a scattergram of the record time (Y) versus the year (A). 

c. Assuming the straight line model Y = pQ + + e, estimate Pq, 

Pi, and 

d. Calculate the significance level for Hq:Pi = —0A versus 

Hi: /I, # -0.4. 

e. Predict the record time for the mile run in the year 2000. 

f. What reservations do you have about using a straight line ? 
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2. The following figures on total federal tax revenue, in billions of 
dollars, are given by the 1976 Statistical Abstract of the United States. 


1950 

44 

1969 

200 

1955 

72 

1970 

206 

1960 

100 

1971 

203 

1965 

126 

1972 

223 

1968 

165 

1973 

248 


a. Plot a scattergram of tax (T) versus the year (X). 

b. Estimate Pq, /?,, and 

c. Calculate the significance level for Hq . pi = 10 versus 

10 . 

d. Predict the total tax for the year 2000. 

e. What reservations do you have about using a straight line? 

3. Calculate the least squares line for the data of Table 5.5. Does your 
equation agree with the equation given by Hume? 

4. For each of the students in your class, obtain X, the length of the 
radius (the arm bone from the elbow to the wrist) and Y, the height. 
Calculate the least squares regression line. 

5. The total federal outlay in billions of dollars for the national defense 
and veterans benefits was as follows for 1965 to 1972. 


Year 

Outlay 

1965 

54 

1966 

62 

1967 

76 

1968 

86 

1969 

88 

1970 

88 

1971 

87 

1972 

88 


a. Plot the data. 

b. Calculate the least squares straight line. 

c. Using the least squares line, what would you predict for the outlay 
in 1977? The actual outlay was 119. 

6. The time required to construct a house is related to the complexity 
and, therefore, the price of a house. The following data represent 
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completion times, X, in months, and the price, T, in thousands of 
dollars. 


X 

Y 

12 

100 

10 

60 

6 

50 

7 

60 

10 

90 

11 

110 

12 

120 

8 

70 

7 

60 


Using the least squares line, about what should a new house cost if the 
contractor is estimating a construction time of 10^ months? 

7. In response to a market survey several motorists report the tire 
pressure that they strive to maintain and the number of miles service 
received on the original tires. The average pressure in pounds is 
indicated by X and the miles (thousands) are recorded by Y. 


X 

Y 

29 

22.5 

28 

21.5 

31 

25.0 

32 

26.3 

27 

21.2 

26 

20.8 

30 

24.0 


a. Calculate the least squares line. 

b. Test the null hypothesis that Pi = 1.0 versus the alternative that 
Pi ^ 1.0. Use a = .05. 

c. What mileage would you expect at 29 pounds pressure? 




ENGLISH CLAY PIPES. Predicting the age of the pipe from the diameter of 














J<I‘ 






the pipe stem bore. U.S. Tobacco Museum, Greenwich, Connecticut. 
























PREDICTION 


24.1 In the previous lecture we concerned ourselves primarily with the 
mechanics of obtaining the least squares line, although we also presented 
methods of forming opinions about reasonable values for the (is in light 
of the data. 

One of the very important uses of regression lines is the prediction of 
future Y values. We cite as examples or possible examples (1) the 
prediction of college grade point average, V, from entrance examination 
scores, X, (2) salesmen productivity, Y, from aptitude test scores, X, 
(3) weight gain of cattle, Y, from number of days on ration. A", etc. The 
most reasonable single predicted value of T is simply 

? = /!„ + /!,X 

We may also wish to give an interval of reasonable values. 

A second problem concerns estimating the mean of the Y population 
for a specified X. Again, the single estimated value is simply 

Y = fio + P^X 

but an interval of reasonable values is different than if we are interested 
in a single Y value. For the examination score example of Lecture 23 we 
might be interested in the final mean score of the population of students 
scoring 70 on the first examination. This is obviously different than being 
interested in the final score of a single student scoring 70 on the first 
examination, and the intervals are different. 

A third problem is to give intervals for the mean of all Y populations 
simultaneously. This is equivalent to specifying a region in which the 
entire true regression line lies with reasonable assurance. 
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A fourth problem of importance in some areas of application is the 
estimation of an unknown X for a known Y value. This is the so-called 
problem of calibration, which arises in the following way. Two measure¬ 
ments, X and Y, are available on a number of specimens. The 
measurement X is quite precise, but it is difficult or expensive. The 
measurement Y is less precise, but it is easy or inexpensive. The least 
squares line Y = Pq + fi^X is obtained and is used to estimate the X 
value from measuring Y on future specimens. 

24.2 Suppose that an instructor has been teaching large lecture sections for 
several years and, while using standardized examinations, has obtained a 
regression line relating course averages (T) to the first examination 
scores. A student has scored 70 on the first examination and is thinking 
about dropping the course. The instructor uses the regression line and 
the first score of the student to obtain a prediction interval for the 
student s course average score. The end points of the interval are given 
by 


where A'o = the A' value of interest (70 in this case), has n - 2 degrees 
of freedom, and the other quantities are calculated from the regression 
data. 

We carry through the calculations using the data from Lecture 23, 
letting Y from that data represent average course scores. Then 

X = 77.125 I(X - Xy = 790.875 
n = 8 = 18.3078 

Po = 35.8541 Pi =0.5416 


So the end points of a 95% prediction interval are given by 

35.8541 -I- (70) (0.5416) ± (2.4469)(4.2788)(1.0905) 
or 

73.766 -t- 11.4173 


or 

62.3488 and 85.1834 
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Thus the 95®„ prediction interval for the student in question is from 
62.3488 to 85.1834. In a real situation such a prediction interval could be 
of value in counseling a student. The student might well question the 
assurance with which we predict the final score to lie between the 
specified limits. We could tell him or her that (given the assumptions), 
95% of all such prediction intervals for students with an initial score of 70 
would contain their course average. 


24.3 Suppose that we wished not to predict for a single student but to estimate 
the mean course score of all students with an initial score of 70. It seems 
quite intuitive that we should be able to locate the mean score more 
precisely than that of a single student. In fact, the confidence interval for 
the y population mean at a given X value is 


^0 + ^ 1^0 ± 


3 iXp-X)^ 
n Z(X - 



It is immediately apparent that this formula is almost the same as that in 
the previous section. Closer inspection reveals that the “1” under the 
radical sign in the previous equation has been removed. So the quantity 
under the radical sign is smaller, and we are adding and substracting a 
smaller quantity to Y to get the end points of the confidence intervals. 
Carrying the calculations through for 95% confidence, we find 


1 (^ 0 - 7 ^) 

n IfA' - Xy 


= 0.4350 


so the end points of our interval are 


73.7661 ±(2.4469) (4.2788) (0.4350) 


or 


73.7661 ± 4.5544 
or 


69.2117 and 78.3205 

In the context of our example we would be 95% confident that the course 
average for the population of students scoring 70 on the first exami- 
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nation lies between 69.2117 and 78.3205. So, in fact, the interval for the Y 

population mean is considerably shorter than the interval for a single 
predicted observation. 



Both of the formulas given by Equations I and 2 arc intended to be used 
with a single X value. If used w'ith two or more values the confidence to 
be attached to 95'*,, intervals is not 95",, for all of the intervals jointly. 
How'cver, a formula does exist for setting joint confidence intervals on 
the mean of the Y populations for all A, i.c. on + //, A^o for all Afo- The 
formula is 





where F is the tabulated F value for the desired level of confidence with 
2 degrees of freedom for the numerator and n — 2 degrees of freedom for 
the denominator. 

Because this is our first mention of F, some further comments seem in 
order. The F distribution is concentrated on the nonnesative numbers 
and is skewed to the right. Although the derivation is due to Fisher [2], 
one of the first tables w'as due to Snedccor [3], and the distribution is 
widely knowm as Snedccor's F. The distribution is tabled in Table A.7, 
where v, = numerator degrees of freedom and r> = denominator 
degrees of freedom. 

It should be apparent that a wider interval is required if we wish to set 
confidence limits simultaneously on all values of/y,) + /y,A' than if we are 
satisfied w'ith one X only. Comparison of Equations 2 and 3 show s that 

this does happen. In Equation 3 we multiply by yJ2F^^^, instead of t,ab- 
Because sj Ft^h about the same as the major difference between 2 
and 3 is determined by y/2. Thus Equation 3 gives an interval at least 
40% wider than that given by Equation 2. 

Carrying through the arithmetic for Xq = 70 with the current 

example, we find y/2F^ = 3.2073 for 95"o confidence, and the end 
points of the interval are given by 


73.7661 ± (3.2073) (4.2788) (0.4350) 


or 

73.7661 ± 5.9697 


or 

67.7964 and 79.7358 
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SUMMARY. Implicitly or explicitly, formally or informally, regres¬ 
sion lines are used to predict the relationship between .V and V, other 
than those observed. Confidence intervals are calculated for (1) a single 
future y value for a specified X, (2) the Y population mean for a specified 
.Y, and (3) the Y population means for all X simultaneously (the entire 
population regression line). 

The three formulas are as follows. 


. , 1 (Xo - X)~ 

Y + \ + -+ ‘ 


n ■ L{X - X)‘ 


.. , ,1 . (X,-X)- 

) + t(T I- + --^ 

n 2L(X — -V) 


> ±^2F(rl- + 


n I(A: - X)~ 


( 1 ) 

( 2 ) 

(3) 


Equation 1 for individual V values gives a wider interval than Equation 2 
for a mean Y value. Equation 3 for the entire regression line gives wider 
intervals than Equation 2 for a single mean Y value. 
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EXERCISES 

1. Using the data from Exercise 1 of Lecture 23, construct a 95% 
confidence interval for the mile run record time for the year 1990 (use 
Equation 1 ). Construct a 95% confidence interval for the record time 
in 1990 using Equations 2 and 3 and give interpretations. 

2. Using the data from Exercise 2 of Lecture 23, construct a 95"o 
confidence interval for the total federal tax revenue in the year 2(XX). 

3. Using the data of Table 5.5, construct a 95% confidence interval for 
the date of a pipe for which the pipe stem measurement is ^4 inch. 
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4. Using the data of Exercise 4 of Lecture 23, construct a 95% confidence 
interval for the height of a person with a radius measurement of 11 
inches and for the average height of all people with a radius 
measurement of 11 inches. 

5. An office supervisor monitors carefully the work of typists under 
different working conditions. The average number of errors per page 
during the day and the average office temperature that day are 
recorded for several different typists. The data are: 


Temperature (A) 

Errors (T) 

68.2 

4.0 

69.1 

3.9 

1^.2 

3.4 

64.3 

5.2 

68.2 

4.1 

69.3 

4.8 

65.0 

6.1 

68.2 

4.4 

70.3 

3.6 

66.7 

6.2 

68.3 

4.2 

69.2 

4.6 


Calculate a 95% confidence interval for the average number of errors 
per page for all typists and for a single typist in an office kept at 65°F. 

6 . In a large department store storewide sale, the following prices 
(dollars), regular (A") and sales (T), were observed on a sample of 
items: 


X Y 


15.00 

13.50 

12.50 

11.00 

56.00 

35.00 

75.00 

42.00 

89.00 

47.50 

77.00 

42.00 

95.00 

57.50 

25.00 

20.00 

37.50 

25.00 

49.50 

29.50 
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Calculate a 95% confidence interval for the sales price of a single item 
regularly priced at $44.50. 

7. A monthly magazine reports the average monthly temperature (°F) 
and the average relative humidity (%) for several American cities. 
Calculate the 95% confidence interval for the average relative 
humidity for a single city with an average monthly temperature of 
60°F where X = temperature, and Y = relative humidity. 


X 

y 

X 

y 

35 

57 

68 

74 

38 

63 

66 

78 

43 

59 

59 

79 

40 

68 

58 

72 

45 

67 

54 

66 

46 

74 

53 

74 

50 

61 

49 

69 
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ROTHAMSTED MANOR HOUSE. Ancestral home of Sir John Bennet Lawes. 
founder of Rothamsted Experimental Station. Reproduced from a print from the 
Royal Commission on Historical Monuments, England. Reprinted by permission 
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THE LEGACY 
OF ROTHAMSTED 
AND AMES 


25.1 


Sir John Benncl Lawcs (1814-1900) was born in the Rolhamsted Manor 
House near Harpenden. England [2]. After studying at Oxford, he 
returned to the manor, which had been in his family for 200 years, to take 


up the management of the family farm in 1834. Because of his interest in 
chemistry, he shortly began to conduct experiments on the effect of 
chemical salts on plant growth. He received a patent for the manufacture 
of superphosphate in 1842 and opened his first factory for its manufac¬ 


ture at Deptford Creek, London, in 1843. In the same year he appointed 
a chemist, J. H. Gilbert, as his assistant, and the systematic study of crop 
nutrition at Rothamsted was begun. The experiments on wheat begun 
in 1843 on Broadbalk field (which continue to this day) were the first of 
many experiments involving most crops grown in England. 

In 1889 Lawes set up the Lawes Agricultural Trust to insure the 
continuation of the Rolhamsted experiments. He endowed the trust with 
a sum of 100,000 pounds and granted a 100-year lease to the fields on 
which the classical experiments were conducted. 

The contribution of Rothamsted to agricultural science is apparent. 
But why should the casual students of statistics be aware of its existence ? 
Because in 1919 there occurred an event of singular importance in the 
history of statistics—R. A. Fisher came to Rothamsted. 


25.2 The story of how Fisher came to Rothamsted is told by Sir E. John 
Russell [7], who was the Director at the time. He writes: 

But Cambridge had nothing to offer him and he left in 1913. For the next 
two years he was unsettled: he volunteered for military service in the 1914- 
18 war hut was rejected on account of his eyesight, and for some time he 
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worked on a farm in Canada. From 1915 to 1919 he taught mathematics 
and physics at carious public schools: he was constitutionallv unfitted for 
the work and was neither happy or .succe.ssful at it: when 1 first saw him in 
1919 he was out of a Job. Before deciding anything 1 wrote to his tutor at 
Cains College whom I knew personally, asking him about his mathemat¬ 
ical ability. The answer was that he could have become a first class 
mathematician had he "stuck to the ropes," but he would not. That looked 
like the type of man wanted, so 1 invited him to Join u.s. I had only £200, 
and suggested that he should stay as long as he thought that the sum should 
suffice, and af ter studying our records he should tell me whether they were 
suitable for proper statistical examination and might be expected to yield 
some more information than we had extracted. He reported weekly at tea at 
my house and always favourably. It took me a very .short time to realize that 
he was more than a man of great ability: he was in fact a genius who must be 
retained.* 

25.3 Fisher was at Rothamsted from 1919 to 1933. During those 14 years he 
made almost immeasurable contributions to the rapidly developing 
subject of statistics. His papers on theoretical statistics still form the 
foundation of much of modern statistics. Statistical methods devised by 
Fisher arc known and used throughout the world. His book. Statistical 
Methods for Research Workers [3], first published in 1925, has gone 
through 14 editions; the last one was published in 1970, after Fisher’s 
death in 1965. Some measure of the tremendous outpouring of Fisher’s 
genius is given by the Collected Papers of R. A. Fisher [1], a five-volume 
set containing over 300 separate articles. It is difficult to comprehend the 
full sweep of Fisher’s work but, in the next several lectures many of the 
ideas of experimental design that we present can be traced either to 
Fisher or his associates and followers. Perhaps the greatest legacy to 
statistics (or science in general) from Fisher’s days at Rothamsted is 
Fisher’s work on experimental design as exemplified by his paper, “The 
Arrangement of Field Experiments” [4], and his book. The Design of 
Experiments [5], first published in 1935. 

25.4 On the other side of the Atlantic in Ames, Iowa, a young associate 
professor of mathematics, George Waddell Snedecor, had becorne 
interested in statistics. Snedecor was born in Memphis, Tennessee, in 
1881 but spent most of his childhood and early adulthood in Florida and 

* Reproduced from The History of Agriaillural Science in Great Britain, by Sir E. 
John Russell. Copyright «i 1966 by George Allen & Unwin (Publishers) Ltd. Reprinte 
by permission from George Allen & Unwin (Publishers) Ltd. 
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Alabama. He received a bachelor's degree in mathematics and physics 
from the University of Alabama in 1905 and a master’s degree in physics 
from the University of Michigan in 1913. In 1913 he was appointed 
assistant professor of mathematics at Iowa State University in Ames: he 
was made associate professor the following year. 

As the interest in statistics at Ames developed, he was given 
responsibility for teaching statistics courses; the first statistics course in 
the mathematics department was offered in 1914 to 1915. 

In the early 1920s Snedecor began a collaboration with Henry A. 
Wallace, who was later to become vice-president of the United States. 
Wallace was editor of Wallace's Farmer in Des Moines, Iowa, at the time 
and was interested in agricultural research. In the spring of 1924 
Snedecor organized a Saturday afternoon seminar to study multiple 
regression and other statistical methods under the guidance of Wallace. 
From these Saturday afternoon seminars came the publication in 1925 
of Correlation and Machine Calculation by Wallace and Snedecor. 

Snedecor and his colleagues were pioneering not only in statistics but 
in the use of punch cards and punch card machines to analyze data. This 
effort led to establishing the Mathematical Statistical Service in 1927 and 
the Statistical Laboratory in 1933. 

In 1931 the first Master of Science degree in statistics was awarded to 
Snedecor’s student Gertrude Cox, who would go on to establish the 
Institute of Statistics in North Carolina. In 1931 Snedecor invited Fisher 
to Ames to teach during the summer session. This summer session 
affected in many ways the development of statistical methods at Ames 
and around the world. 

In 1934 Snedecor published a book called Calculation and 
Interpretation of Analysis of Variance and Covariance. In 1937 there 
followed the first edition of his now classic Statistical Methods. This 
book has now gone through six editions, the sixth coauthored with 
William G. Cochran, and in 1974 had sold over 127,000 copies. 

Snedecor’s contributions were great. The statistics program at Ames 
attracted outstanding statisticians from all over the world, and the 
Department of Statistics at Iowa State University was the first in the 
United States. 


SUMMARY. The Rothamsted Experimental Station was established 
by Sir John Bennet Lawes on his family estate near Harpenden, England. 
The first experiments on wheat were begun in 1843. In 1889 Lawes 
established a trust fund to ensure the continuation of the experiments. 
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When Fisher went to work at Rothamsted in 1919, the experiment 
station was already world famous. Because of Fisher’s work there from 
1919 to 1933, this fame embraced statistics as well as agricultural 
research. At Rothamsted, Fisher developed basic statistical methods as 
well as the theoretical foundations. 

Snedecor, at Iowa State College, became interested in statistics in the 
early 1900s. This interest led to the organization of a consulting service in 
1927 and the Statistical Laboratory in 1933. Because of Snedecor’s 
interest in statistics, he invited Fisher to the United States in 1931. This 
visit had a great effect on statistics at Iowa State University and in the 
United States. 
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RANDOMIZATION 


26.1 Suppose one finds a magnetic compass that has been exposed to the 
weather. The needle no longer turns because of rust and corrosion. Is it 
reasonable to believe that the needle is pointing toward the north ? If one 
were hopelessly lost in a wood one might be inclined to act on the hope 
that the needle is pointing north. In fact, without knowledge of how the 
compass came to be in its present state, it is impossible to have a strong 
conviction about what direction the needle is pointing. We would have a 
totally different opinion if we were using a tried and proven compass on 
which the needle had come to rest after being spun gently. 

An analogous situation sometimes occurs where data seem to be 
pointing toward some particular conclusion. Fisher perceived that 
randomization provided a way of collecting data that would allow 
forceful conclusions—a way of “spinning the needle.” 

26.2 Let us then suppose, instead of finding a compass, that we come across 
an experiment on the effect of two different fertilizers on the yield of 
wheat. Three experimental plots have been given treatment 1, Tj, and 
three plots treatment 2, T 2 . The yield (in bushels per acre) has been 
recorded as follows. 


T, 

7'2 

36.4 

38.7 

40.2 

39.1 

36.2 

36.6 
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Beyond reporting that there is a difference in average yield of 0.53 
bushels, how is one to proceed? Do the data clearly point to the 
superiority of treatment 2? Since we do not know how the data were 
arrived at. we cannot be sure. Perhaps treatment 2 was assigned to the 
more fertile plots of ground or perhaps the treatment 1 plots were next to 
a competing crop. 

One way of proceeding with forming an opinion is to consider what 
might happen if the treatments were assigned at random to the 
experimental plots. Suppose that this were done and, in fact, treatments 1 
and 2 were equal in their effect on the yield of wheat. Then the difference 
between yields obtained represents properties of the plots only, and the 
labeling by T, and T 2 is purely a random labeling. Let us consider the 20 
different assignments of T, and T 2 that could be made to these six plots. 
In Table 26.1 these assignments are shown along with the resulting 
means 7; and T 2 and the differences Tj - f,. 

The difference actually observed in average yields, 0.53 bushels per 
acre, is now more meaningful when we consider the column of 20 


Table 26.1 Randomization of Treatments* 


Out¬ 

come 

36.4 

40.2 

36.2 

38.7 

39.1 

36.6 

T'l 

n 

T2-fi 

1 

7-, 

T-, 

T'l 

T": 

T'2 

T 2 

37.60 

38.13 

0.53 

2 

T-, 

7', 


T'l 



38.43 

37.30 

-1.13 

3 

T, 

T-, 



T'l 


38.57 

37.17 

-1.40 

4 

T, 

T-, 




T'l 

37.73 

38.00 

0.27 

5 

T, 


T'l 

T'l 



37.10 

38.63 

1.53 

6 

7', 


T'l 


T'l 


37.23 

38.50 

1.27 

7 

T-, 


T'l 



T'l 

36.40 

39.33 

2.93 

8 

T’. 



T'l 

T'l 


38.07 

37.67 

-0.40 

9 

7’, 



T'l 


T’l 

37.23 

38.50 

1.27 

10 

T, 




T'l 

T'l 

37.37 

38.37 

1.00 

11 


7’. 

T'l 

T'l 



38.37 

37.37 

-1.00 

12 


Ti 

T'l 


T'l 


38.50 

37.23 

-1.27 

13 


T'l 

7| 



T'l 

37.67 

38.07 

0.40 

14 


Ti 


T'l 

T'l 


39.33 

36.40 

-2.93 

15 


T'l 


T'l 


T'l 

38.50 

37.23 

-1.27 

16 


T'l 



T'l 

T'l 

38.63 

37.10 

-1.53 

17 



T'l 

T'l 



38.00 

37.73 

-0.27 

18 



T'l 

T'l 


T'l 

37.17 

38.57 

1.40 

19 



T'l 


T'l 

T'l 

37.30 

38.43 

1.13 

20 




T'l 

T’l 

T'l 

38.13 

37.60 

-0.53 


•Tj is assigned to the blank spaces. 
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possible differences that could have been observed. First, note occur¬ 
rence of the pairs of differences: 0.53 and — 0.53, —1.13 and 1.13, — 1.40 
and 1.40, etc. This is entirely reasonable because if the treatments were 
equal in effect, the sign of the difference is purely due to the labeling, a 
random effect. Second, note that 18 of the 20 differences are as large or 
larger in absolute magnitude than the observed difference. That is, 

P (absolute difference > 0.531 equal effects) = 0.90 

Because of the high probability, the idea or hypothesis of equal 
treatment effect seems quite plausible. 


26.3 We have just performed what was previously called a test of significance 
in Lecture 17. The general idea is that one devises a test statistic (the 
absolute difference of means in this example) and calculates the 
probability of statistic values as unlikely as that observed, given that the 
null hypothesis is true. This probability, called the significance level, 
provides a credibility index for the null hypothesis. Large values support 
and small values tend to negate the null hypothesis. 

It should be emphasized that the analysis performed in the previous 
section could be performed regardless of how the treatments were 
assigned to the experimental plots. However, Fisher recognized that the 
act of actually assigning the treatments at random makes the analysis far 
more meaningful. How it does this is not clear. Perhaps it is psycholo¬ 
gical; it is not purely mathematical. We would like to say that it is 
analogous to the compass. Experimental results that point in a 
particular direction are more readily understood if we know that equal 
opportunity has been provided for all directions. 


26.4 In addition to increasing the validity of the test of significance already 
described, randomization provides a probability framework that enables 
us to obtain unbiased estimates of treatment differences and to estimate 
unbiasedly the variance of our estimate of treatment differences. To 
understand this, let us suppose that the basal yields (yields that would be 
obtained in a given year without fertilizer) for our six plots are, in fact, the 
yields used in Table 26.1. Let us also suppose that the effect of treatment 
1, Ti, is to add 1.0 bushel per acre, and the effect of treatment 2, t,, is to 
add 3.0 bushels per acre. Then the 20 possible values for T, — T, would 
be those in Table 26.1 plus 12 ~ That is, the 20 

possible differences would be 1.53, —0.13, —0.40, 1.27, 2.53, 2.27, 3.93, 
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0.60, 2.27, 2.00,0.00, -0.27, 1.40, - 1.93, -0.27, -0.53,0.73, 2.40,2.13, 
and 0.47. The average of all possible values for T 2 - T, is then Tj - t, 
= 1.0 bushel per acre. What has randomization achieved for us? It gives 
us a clearly defined population of numbers (the possible values for 
T 2 - Ti) for which the mean value is Tj - Tj, the parameter of 
interest. Furthermore, we have, in effect, selected one of the possible 
values for T 2 — Tj at random. In repetitions of the experiment we 
would obtain, on the average, a difference of X 2 - t,. Thus random¬ 
ization assures us of an unbiased estimate of the treatment difference 

T2 - T,. 

The variance of the population of 20 differences is calculated by the 
formula — D)^/20, where D, = a typical difference. Thus we find 
Var(T 2 — Tj) = 1.89. Just as randomization makes possible an unbiased 
estimate of T 2 - Tj, it also provides us with an unbiased estimate of 
Var(T 2 - Tj). Suppose that the randomization were that of the first line 
in Table 26.1. Then the results would be 

T, T 2 

38.4,42.2,38.2 41.7,42.1,39.6 

First, we estimate the inherent variability within the observations from 
7, and within those from Tj. Then we pool our estimates. 

From Tj, = 5.08 
From T 2 , 52^ = 1-80 
Average = 3.44 

Our estimate of Var(7’2 — T,) is then given by 

= 2.29 


26.5 In summary, randomization increases the validity of a test of significance, 
provides a well-defined population with a mean of T 2 — Tj and a variance 
of Var(7’2 - ^ 1 )* allows unbiased estimates of the treatment dif¬ 
ference and of the variance of (7’2 — Ty). 

We have already mentioned that Fisher is credited with the use of 
randomization in experimental design. However, the idea of random 
sampling from a population was known and used before Fisher. For 
example, Peirce [1] describes induction as “reasoning from a sample 
taken at random to the whole lot sampled.” He insists on two rules of 
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inductive inference: (1) the sample must be random, and (2) the proposed 
investigation of the sample must not be suggested by the characteristics 

of the sample. 


SUMMARY. The physical act of randomly assigning treatments to 
experimental plots justifies the use of a probability model for the 
observations. 

If the treatments have equal effects, the labeling of observations by 
treatment received can be viewed strictly as a random labeling. The 
difference between sample treatment means may be due purely to the 
particular labeling chosen. Therefore it is reasonable to consider all of 
the possible treatment assignments and the different values of the sample 
treatment means that might result from a given set of observations. This 
leads to an observed significance level: the probability of a treatment 
difference as large as that observed. 

If treatments have different effects, consideration of all possible 
random treatment assignments shows that the sample treatment 
differences estimate unbiasedly the population treatment differences. The 
population of sample treatment differences also has a variance that can 
be estimated unbiasedly from a sample. 
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EXERCISES 

1. Verify the values for Tj — T, in Table 26.1. 

2. Show that when r, = 2.0 and Tj = 3.0, as in Section 26.4, the average 
of all possible values for Tj — T, is 1,0. 

3. In Section 26.4 verify that s,- = 5.08 and = 1.80 and, therefore, 
that VarfTj — Tj) is estimated by 2.29. 
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4. Design and conduct a simple experiment involving two treatments 
assigned at random. Estimate the treatment difference and then 
estimate the variance of your estimate. For example, you might study 
the effects of additives on automobile mileage, weight losses from 
different diets, or cleansing power of different detergents. 

5. A car club designs a small experiment to compare the mileage 
obtained from two different blends of gasoline. Blend A is assigned at 
random to two of five cars and blend B to the other three cars. The 
average mileage obtained during the experiment is recorded for each 
of the five cars. 


Blend A 

Blend B 

22.3 

24.2 

20.7 

25.1 


22.6 


Calculate the observed significance level for the hypothesis that there 
is no difference between the average mileage for the two blends of 
gasoline. Follow the procedure in Section 26.2. 

6. In a course with multiple sections, three sections are selected at 
random. In these sections the instructor gives a quiz at the end of each 
hour. In the other three sections no quizzes are given. At the end of the 
course the students in all six sections take a common examination. 
The average final test scores are given for the two methods of 
instruction. 


Quiz 

No Quiz 

82.1 

78.9 

80.2 

80.1 

79.8 

79.4 


Calculate the observed significance level for the hypothesis that there 
is no difference in the two methods of instruction. 

7. Two members of a coffee club become involved in a dispute about the 
shape of coffee cups. One person claims that there is more heat loss 
from cups which flare outward at the top than from cups which are 
regular cylinders. They design a small experiment to compare the two 
shapes. Hot coffee is poured in five cups, two flared and three not. 
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They measure the heat loss (°F) over a 10-minute period. The results 
are: 


Flare 

No Flare 

22 

18 

24 

23 


19 


Estimate the mean difference in heat loss for the two populations of 
cups and estimate the variance of this estimate. Calculate the 
observed significance level for the hypothesis of no difference. 




GEORGE W. SNEDECOR. 1882-1974. Reproduced by per- 




mission from the Department of Statistics, Iowa State University. 





THE 

ANALYSIS OF 
VARIANCE 


27.1 The analysis of variance was published in 1923 by Fisher [I] just 5 years 
after he used the word variance to describe the square of the standard 
deviation [2]. Although Fisher is regarded as the originator of the 
analysis of variance, some of the ideas certainly predate him. Stigler [3] 
calls attention to the fact that Edgeworth carried out calculations much 
like the analysis of variance as early as 1885. Since 1923 the analysis of 
variance has become an extremely useful statistical analysis (perhaps 
more frequently performed than any other). Countless articles and 
books have been written concerning some facet of the analysis of 
variance, and it has been theoretically described at virtually every level of 
mathematics. In this lecture we wish to give some insight into the nature 
of the analysis of variance at an introductory level, using relatively little 
mathematics. However, it is felt that some arithmetic is unavoidable. 


27.2 In fact, the analysis of variance is not a study of variance exactly but of 
variation. For a concise, intuitive statement we commend the description 
of Snedecor [4]. “It is a technique for segregating from comparable 
groups of data the variation traceable to specified sources.”* This simple 
statement gives a good idea of what we are about and is a beacon to 
prevent our getting lost in computational details. 

Once we decide to try to segregate variation in our data by specified 
sources of variation, we must decide how we are going to measure 
variation. In fact, the only way of measuring variation that has proven 
useful for this purpose is the sum of squares of deviations from the sample 

• Reprinted by permission from The /ouvi Slate Vniiersiiy Press from Snedecor, 
Calculation and Interpretation of Variance and Coi ariance, Snedecor, 1934. 
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mean. The analysis of variance, then, is concerned with partitioning the 
total variation in the data (as measured by the sum of squares of 
deviations from the sample mean) into bits of variation (also measured by 
sums of squares) traceable to specified sources. 


27.3 For the first example of an analysis of variance, let us return to the 
example in Lecture 23. The first exam scores (A") were used to describe 
the final exam scores (V'). The predicted final exam scores (T) were the 
final exam scores predicted by the least squares line. The data are 
summarized in Table 27.1. 

In the spirit of SnedecoTs description we wish to partition and 
segregate the variation traceable to specified sources. We have already 
calculated the sum of squares of deviations from the sample mean to be 

I(y - V)^ = 341.875 


What are the possible sources for this variation? Obviously, one 
reason that there is variation in the V values is the apparent linear trend. 
Let us calculate the sum of squares of deviations from the sample mean 
of the y values. 


If - = 48,437.1529 


and 


If = IT = 621 


so 


I(f _ Y)- = 48,437.1529 - 48,205.125 
= 232.0279 


Table 27.1 Regression Example 


.V 

y 

f 

63 

68 

69.9749 

65 

75 

71.0581 

72 

70 

74.8493 

73 

76 

75.3909 

80 

81 

79.1821 

85 

78 

81.8901 

86 

89 

82.4317 

93 

84 

86.2229 

Total 617 

621 

621.0000 
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The other source or reason for variation is that not all of the Y values 
fall on the regression line. We measure this variation by I ci’, where d 
represents the deviation of the observed Y value from the predicted value 
Y. That is, d=Y-Y and I.d^ = I.(Y- Y)^. In Lecture 23 we 

calculated 

I(i- = 109.8471 

No other sources of variation occur to us and we note that 

341.875 = 232.0279 + 109.8471 


or 

total SS = regression SS + residual SS, 

where SS = sum of squares of deviations from the sample mean. It is 
conventional to record these results (along with other quantities to be 
explained shortly) in a table similar to Table 27.2. Notice that AOV is 
used as an abbreviation for analysis of variance. 

The columns headed Source and SS should be clear. However, the 
columns headed df and MS require further explanation. Degrees of 
freedom, abbreviated df, are difficult to explain at the beginning, and the 
name seems strange. The name and meaning are related to the use of the 
expression in engineering and physics. Beginning students are perhaps 
well advised simply to accept the idea and terminology at first. Use and 
familiarity come to give meaning to degrees of freedom. The abbrevi¬ 
ation MS means mean square, and the entries in this column are obtained 
by dividing the sum of squares by their appropriate degrees of freedom. 

The immediate use that can be made of the mean square column (given 
certain assumptions) is to perform an F test of the hypothesis that the 
regression coefficient is zero. 

F = regression MS/residual MS 

= 232.0279/18.3078 
= 12.6737 


Table 27.2 Regression AOV 


Source 

df 

SS 

MS 

Total 

7 

341.8750 


Regression 

1 

232.0279 

232.0279 

Residual 

6 

109.8471 

18.3078 
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From Table A.7, we see that the probability of an ¥ this large with 1 
degree of freedom for the numerator and 6 degrees of freedom for the 
denominator is about .01. That is, 

SL = P{F > F observed) = .01 
Therefore we feel strongly that ^ 0. 

27.4 For those with more algebraic inclinations the partition of the total sum 
of squared in Table 27.2 is simply explained by the following algebraic 
identity. 

z(y, - F)* = i(i5 - ?, + i; - ?)* 

= I(<i - ?)‘ + X(Y, - 
= regression SS + residual SS 


27.5 For the next example of an analysis of variance consider data that are 
grouped into several classes [e.g., (1) family incomes in several different 
cities, (2) ages of people using different libraries, and (3) IQs of people in 
different university classes]. Unlike the example of the preceding 
sections, there is no regression line and the recognizable sources of 
variation are simply between classes and within classes. In order to keep 
the arithmetic simple, let us consider another small example. Suppose 
that the data in Table 27.3 represent the weight gains of 12 animals in a 
nutrition experiment, with 4 animals receiving one of three possible 
rations. 


Table 27.3 Nutrition Experiment 


Ration 




A 


B 


C 



y 

? 

(T- 

V 

f (Y-9)^ 

Y 

? 

(T- 


24 

27 

9 

24 

29 25 

25 

27.75 

7.5625 


30 

27 

9 

26 

29 9 

30 

27.75 

5.0625 


26 

27 

1 

32 

29 9 

27 

27.75 

0.5625 


28 

27 

1 

34 

29 25 

29 

27.75 

1.5625 

Total 

108 


20 

116 

68 

111 


14.7500 

Mean 

27 



29 


27.75 
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In Table 27.3 the weight gains are represented by Ys, and the other 
columns have been constructed in a manner analogous to that for the 
regression example. For ration A the sample mean 27 is taken as the 
predicted value 29 as Y for ration B, and 27.75 for ration C. Then the 
column (T — follows in a natural manner. For this example, 

I(y- ?)2 = iy2 _ (17)^12 
= 9463 - 9352.0833 
= 110.9167 

Using the same notation as for the regression example, 

!(?- = 9352 0833 

= 4(27)2 ^ 4(29)2 + 4(27.75)2 
- 9352.0833 
= 8.1667 

Then the residual SS = I(y - y^2 

= 20 + 68 + 14.75 
= 102.75 

Assembling these quantities in an analysis of variance table, we have 
Table 27.4. 

The degrees of freedom are arrived at in the following manner. The 
total degrees of freedom are one less than the number of observations, the 
between degrees of freedom are one less than the number of groups or 
classes, and the within degrees of freedom are obtained by subtracting. 
Alternatively, within degrees of freedom are obtained by adding the 
degrees of freedom within groups (3 + 3 + 3). 


Table 27.4 One-Way AOV 


Source 

df 

SS 

MS 

Total 

11 

110.9167 


Between 

2 

8.1667 

4.0834 

Within 

9 

102.7500 

11.4167 
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The F test for testing equality of ration population means is simply 

between MS 

F = -;- 

within MS 

_ 4.0834 ^ j 

From Table A.7 we see that the probability of an F this large is extremely 
great, so the hypothesis of equal means is quite acceptable. 

27.6 The analysis of variance of the one-way classification just given can be 
described simply by an algebraic identity. With n observations in each of 
t groups, let Yij denote the jth observation in the /th group, M, the mean 
of the / th group, and M the overall mean. Then 

X (1^, - Mf = X - A/)^ + X iYij - 

ij i U 

27.7 We have introduced the analysis of variance as a technique for 
partitioning the total variation into recognizable sources of variation. Our 
illustrations cover only simple linear regression and the one-way 
classification, so that there are only three sources of variation. 
Obviously, the full power is realized only when there arc more sources of 
variation. Nevertheless, the ideas in more complicated situations are 
similar to the ideas expressed in this lecture. 

27.8 Fisher gave the analysis of variance in his Statistical Methods for 
Research Workers. However, the table he gave was for z, not F. This 
required taking one-half the natural logarithm of the ratio of mean 
squares. This awkward calculation was eliminated by the tabulation of 
F = e^‘. Although the F table was first tabulated by Mahalanobis in 
1932, the F became widely known as Sncdccor’s f. The F tables appear 
in Sncdccor’s book [4] that was published in 1934. Sncdecor says simply 
that the table was computed from Fisher’s z table. Despite the obvious 
advantage of F over z, Fisher continued to give the z tables in subsequent 
editions of his Statistical Methods. 

SUMMARY. The analysis of variance, first given by Fisher in 1923, is 
one of the most frequently performed statistical analyses. Basically, it is 
an analysis not of variance but of variation as measured by the sum of 
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squares of dc\ unions from the sample mean. This sum of squares, called 
ihe total sum of squares, is parlilioned into pieces traceable to 
recognizable sources of variation. 

W ith simple linear regression, the total sum of squares is partitioned 
into two terms: the sum of squares due to regression and the residual sum 
of squares due to dev iations from the regression line. The residual SS can 
be written as residual SS = (total SS)(I — r*). 

W ith a one-vva> classilication of data, the total SS is partitioned into 
the between groups SS and the within groups SS. 

Each sum of squares has certain degrees of freedom and a mean square 
is the sum of squares divided by its degrees of freedom. To test equality of 
group means one uses an /'statistic, which is the ratio of between groups 
mean square to the within groups mean square. 
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EXERCISES 

1. Verify the analysis of variance in Table 27.2 

2. Verify the analysis of v ariance in Table 27.4. 

3. Prove the identity in Section 27.6. 

4. Construct an analysis of variance table such as Table 27.2 for the 

data of Exercise 1 in Lecture 23. Using the F statistic, calculate the 
significance level for Ho -fi, = 0 versus # 0. 

5. Repeat Exercise 4 for the data of Exercise 2 in Lecture 23. 

6. .A researcher in child psychology designed an experiment that 
required children to push a button when they recognized pictures 
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flashed on a screen. He measured the reaction times in seconds for 
children in four age groups. 


4-year 

5-year 

6-year 

7-year 

40 

37 

34 

32 

42 

36 

31 

28 

38 

35 

32 

27 

46 

42 

38 

33 

41 

48 

37 

29 


Construct an analysis of variance for these data similar to Table 
27.4. Using the F statistic, calculate the significance level to test the 
equality of reaction times for the four age groups. 

7. Faculty members responding to a salary survey gave their rank, 
salaries, and years in rank. The years in rank, X, and annual salaries, 
y, in thousands of dollars, follow. 


X 

y 

X 

Y 

5 

25.5 

20 

42.1 

10 

27.5 

12 

29.4 

8 

28.3 

16 

33.4 

15 

32.5 

8 

27.8 

18 

36.3 

2 

23.1 


Calculate the analysis of variance as 

in Table 27.2 and the F statistic. 

What is the significance level for the hypothesis that the regression 

coefficient is zero? 




The cooking times for green beans reported by housewives over the 
country at different altitudes were as shown in the following table. 
Calculate the analysis of variance table and F statistic. Calculate the 
significance level for the hypothesis that altitude does not affect the 

cooking time. Altitude in thousands of feet = 

X and cooking time m 

minutes = Y. 




X 

y 

X 

y 

.6 

25 

6.0 

58 

.8 

30 

3.0 

42 

1.2 

35 

1.1 

33 

2.2 

41 

.5 

20 

3.6 

46 

.4 

15 

5.0 

54 

2.0 

40 
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9. A sociologist, studying attitudes on religious issues, administers a 
questionnaire to university students. Each student responds to 50 
statements on a 1 to 5 scale, the higher response being a more 
conservative response. Then the average response for 50 items is 
recorded for each student. The students fall into three groups: no 
religious training, some religious training, considerable religious 
training. 


None 

Some 

Considerable 

4.2 

2.4 

4.6 

4.0 

1.8 

3.9 

3.1 

2.2 

3.5 

2.7 

1.9 

3.7 

2.3 

2.4 

2.8 

3.3 

2.7 

3.4 

4.1 


3.6 



3.8 


Perform the analysis of variance and calculate the F statistic and the 
significance level for the hypothesis of no differences among the 
group means. 

10. At the twenty-fifth reunion of a high school graduating class, the 
participants completed an information sheet that gave their annual 
salary and highest university degree earned. Two responses that 
were very high were discarded. The remaining responses produced 
the following statistics. 



n 

X 

5 

B.S. 

65 

17.250 

2.125 

M.A. 

12 

21.125 

3.275 

Ph.D. 

8 

26.340 

2.960 


Perform the analysis of variance, calculate the F statistic and 
calculate the significance level for the hypothesis that the mean 
salary is the same for all three groups. 



SOME 

RANDOMIZED 

EXPERIMENTS 



EXPERIMENTAL DESIGN. A forestry experiment begun in 1929 
and photographed about 1945. Courtesy of The Forestry Commission, 
England. 










THE 

CONCEPT OF 
EXPERIMENTAL 
DESIGN 


28.1 The world of experimental science is indebted to Fisher for his pioneer 
work at Rothamsted. Most certainly experimental science did not begin 
with Fisher but experimental design, as we now use the term, did. Fisher's 
article [3] paved the way for his work of monumental importance. The 
Design of Experiments [4]. The ideas that flowed from that work have 
been widely disseminated by books such as Snedecor’s Statistieal 
Methods [6] and Experimental Design by Cochran and Cox [I]. The 
well-known books by Kempthorne [5] and Cox [2] are at a more 
advanced level. 


28.2 In a way the expression experimental design is misleading. We are not 
concerned with the design of laboratory equipment or the breeding of 
colonies of laboratory animals such as mice or birds, nor arc we directly 
concerned with the design of instruments to measure experimental 
responses. Instead, we are concerned with the logic of collecting and 
analyzing data. By experimental design we mean the plan for generating 
the data so that the analysis can use methods based on well-developed 
theory, tested and tried by one’s peers. 


28.3 


It must also be emphasized that this lecture is concerned with a certain 
type of experiment—a comparative experiment, not an absolute experi¬ 
ment. An absolute experiment is conducted to determine the absolute 
magnitude of a physical constant or the absolute existence or non¬ 
existence of some phenomenon. A comparative experiment, on the other 
hand, is conducted to compare the relative values of several constants 
and is concerned with the differences, or perhaps the ratios, among them. 
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For example, one successful transmission of the human voice by wireless 
was enough to establish absolutely that radio programs were possible. 
On the other hand, many comparative experiments were conducted in 
order to arrive at today’s design for transmitters and receivers. 

28.4 Next we present a certain way of describing a comparative experiment. 
Although it will not fit all cases, it describes most of them and has proven 
extremely useful. It might even be called the canonical form of a 
comparative experiment. We consider the case where stimuli are applied 
to some sort of experimental material, producing a measurable response. 
The stimuli are called treatments, a generic term only, and the 
experimental material is organized into units called experimental units. 
We will say more about the definition of an experimental unit. 
Measurement of the responses gives observations that are then analyzed. 
Finally, the statistical analysis must be interpreted by the experimenter. 
These ideas are portrayed in Figure 28.1. Although an experiment is 
designed with an analysis in mind, the design is determined by the way in 
which the experimental material is organized and the way in which the 
treatments are applied to the experimental material. 



Figure 28.1 Design of an Experiment. 


28.5 Science has come a long way through accidents, unforeseen outcomes, 
and chance observations. Nevertheless, the design and analysis of an 
experiment should not rest entirely on previously unknown and untried 
ad hoc methods. The analysis of an experiment generally is based on a 
mathematical model for the observations, and the mathematical model is 
related to and justified by the design. In the lecture on randomization we 
emphasized that the physical act of randomization justifies considering 
the conceptual population of different assignments of the treatments. It 
might even be said that randomization is the step that introduces 
probability models for the observations. This fact was perceived by 
Fisher and must be regarded as a fundamental contribution. 

Frequently probability models using the normal distribution are 
introduced into the analysis. The link between randomization and 
normally distributed observations is not well defined, but it seems to be 
justified on the basis of considerable experience. 
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28.6 In Section 28.3 we referred briefly to the need for more observations in 
comparative experiments than in absolute experiments. It is more than a 
need for several observations. We must have several independent 
randomizations of treatments to experimental units. Such independent 
randomizations are called replications. To illustrate the idea of repli¬ 
cation, consider designing an experiment to compare the effects of three 
different plant growth stimulants on green bean plants grown in a 
greenhouse. Little credence would be attached to an experiment in which 
treatments A, 6, and C were assigned at random to one plant each. The 
relative merits of A, B, and C would be dictated by the relative merits of 
the three plants to which they were assigned. On the other hand, if the 
three treatments are randomly assigned to many plants, we feel that the 
plant differences will “average out,” and the relative effects oi A, B, and C 
will have a chance to manifest themselves. 

Many observations do not necessarily mean that there have been 
many replications. Suppose that the plants in the greenhouse are 
arranged on three tables and the treatments are randomly assigned to 
the three tables, with all plants on a given table receiving one treatment. 
Although this would result in many observations, there is only one 
replication, and it has the same status as the experiment involving three 
plants only. As far as comparison of .4, B, and C is concerned, it would 
yield only three numbers, the total responses for the three tables 
involved. 

28.7 The comparison of treatments ultimately involves comparing two 
treatments, so let us consider the problem of only two treatments. We 
will calculate from our data an estimate of the difference between the 
two treatments effects, say t. This estimate will be in error. That is, 

** 

T = T + e 

or estimate = true effect -t- experimental error. What does the experi¬ 
mental error include? Stated another way, what factors cause our 
estimate to be anything other than the true effect ? The following list has 
been useful in trying to answer this question. 

1. Differences in experimental units. 

2. Errors in applying treatments. 

3. Errors in measuring the response. 

4. Unknown factors. 

5. Factors known but ignored. 

All of these factors contribute to the experimental error. 




272 


IDEAS OF STATISTICS 


In judging the significance of our estimate t, we need to estimate the 
variance of our experimental error (called briefly experimental error 
variance). It seems almost obvious that any estimate of experimental 
error variance should include all of the factors that produced the 
experimental error in the first place. For example, in the greenhouse 
experiment with the treatments assigned at random to the tables, the 
experimental error variance cannot be estimated by differences among 
plants on the same table because these differences do not include table 
differences. 

28.8 In designing experiments it is frequently the case that we look ahead to 
the analysis of variance likely to result from data. This is so because the 
analysis of variance is so often used for data from designed experiments. 
Frequently some writers may refer to the analysis of variance or to the 
model on which it is based as the design. However, we reserve the word 
design to describe the plan for assigning the treatments to the experi¬ 
mental material. 


SUMMARY. The ideas of experimental design draw heavily on the 
work of Fisher. In comparative experiments the object is to compare two 
or more stimuli called treatments. The treatments are assigned to units of 
experimental material, which results in responses and observations. The 
observations are then analyzed and the analysis interpreted. The design 
of the experiment is concerned primarily with the organization of the 
experimental material and the assignment of the treatments. 

Analysis of the data results in an estimate f of the treatment effect t, 
but the estimate is in error. Much effort is concerned with estimating the 
variance of this experimental error. Replication (i.e., independent 
repetitions of treatments) has the effect of decreasing the experimental 
error variance. At the same time, it makes possible estimation of the 
experimental error variance. 
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29.1 


TWO-GROUP 
AND PAIRED 
EXPERIMENTS 


In 1954 a public experiment of massive scale was conducted in the 
United States to evaluate the elTectiveness of the Salk vaccine. Meier [2] 
aives an excellent account of the role that statistics played in the Salk 
vaccine held trials. Involving over 1 million children at a total direct cost 
of S5 million, the trials needed planning according to sound principles of 
experimental design. Part ot the study was planned as a two-group 
design. About 200,000 children were inoculated with Salk vaccine; 
another 200,000 children were inoculated with a control (a “sugar water 
that resembled the Salk vaccine in appearancel. The polio rate for these 
two groups was markedly different, highty-two of the vaccinated 
contracted polio; 162 of those who received the control contracted polio. 
This demonstration of the effectiveness of the vaccine helped lead to its 
general usage. 

A much smaller two-group experiment is described by Pfeiffer 
[3, p. S6]. From the time of Aristotle, scientists had accepted the idea 
that certain forms of life were generated when other matter decayed. This 
idea was challenged in 1668 by Francesco Redi. Redi put meat in two 
jars, covering one and leaving the other uncovered. Meat in both jars 
decomposed, but only the open jar generated Hies. This experiment 
disproved the idea of spontaneous generation of life from decaying 
matter. 

Wc now wish to describe in detail two-group and paired experiments. 
To simplify calculations, wc will consider small-scale experiments. 

How would you design an experiment to compare the effectiveness of 
two headache remedies? Both of the remedies, A and B, are advertised by 
their manufacturers as being superior to the other. Wc arc interested in 
only one aspect of effectiveness: time required for relief from pain. 
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Suppose that all of the 20 students in your statistics class have 
volunteered to participate in whatever experiment you design. 

In this case one of the major difficulties is in measuring time to relief 
because what constitutes relief from pain for one person presumably will 
not for another. Also, people will feel the need to take headache remedies 
at different thresholds of pain. After studying the matter of pain and 
reviewing the literature on similar experiments, you decide to proceed as 
follows. When one of your volunteers experiences a headache, he or she 
will take the prescribed remedy when the pain becomes severe and 
record the time required before becoming comfortable. 


29.2 Having devised a measure of time to relief, you must now decide which 
randomized design to use: a two-group design or a paired design. The 
two-group design would consist of selecting some of the 20 volunteers at 
random to receive remedy A ; the others would receive remedy B. 
Generally, we would select 10 for /4 and 10 for B. The paired design calls 
for matching or pairing the subjects into 10 pairs so that the individuals 
in each pair are as much alike as possible and then, for each pair, 
randomly assigning A to one individual and B to the other individual. If 
there is a sound basis for pairing, the idea is that within each pair we will 
get a very precise comparison of A with B, and we can then combine all 
10 of these precise comparisons. If, on the other hand, there seems to be 
no basis for pairing, the two-group design would likely be preferred 
because of its simplicity. In the problem under discussion you would 
need to have considerable information about reaction to pain of your 
volunteers before you could expect to pair them intelligently. We will 
continue this lecture by following through examples for both designs 
being discussed. 


29.3 If you decide to use the two-group design, you do so knowing before 
observations are ever collected how you are going to analyze the data. 
The data can be analyzed using calculations such as those illustrated in 
the randomization lecture. More commonly, the analysis will be based 
on the model that observations A and B represent two random samples 
from two normal populations. The analysis will be directed toward 
forming opinions about the means and variances of these populations. In 
what follows we will illustrate the analysis of the data in Table 29.1, 
assuming that the two population variances are equal and that we wish 
to compare the means. 
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Table 29.1 Data from Two-Group Experiment Relief 
Times (Minutes) from A and B 



A 

A^ 

B 

B2 


25 

625 

32 

1024 


16 

256 

44 

1936 


20 

400 

46 

2116 


30 

900 

28 

784 


46 

2116 

36 

1296 


22 

484 

48 

2304 


83 

6889 

53 

2809 


26 

676 

76 

5776 


34 

1156 

58 

3364 


26 

676 

42 

1764 

Total 

328 

14,178 

463 

23,173 

Mean 

32.8 


46.3 



The relief times are given in columns A and B. The squares of these 
numbers are given in columns A^ and Before performing any other 
calculations, note the average relief times for A and B. Remedy A 
produced relief 13.5 minutes faster on the average, so one might 
reasonably ask why further calculations are necessary. Isn’t it obvious 
that the experiment proves A is better than B? But wait! Among those 
individuals receiving remedy A and among those individuals receiving B 
there are differences larger than 13.5. The proper approach is to try to 
judge whether the difference of 13.5 in the means is significant when we 
consider the differences among experimental units treated alike. Put 
another way, we are interested in the probability of a difference as large 
as 13.5 occurring by chance even if the population means for A and B are 
equal. To find this probability, we will estimate the standard deviation of 
the difference of means and calculate a t value. 

A = 32.8 B = 46.3 

(XAjVlO = 10,758.4 (ZBlVlO = 21,436.9 

Z(/l - A)^ = I.A^ - (2/4)2/10 I(B - B)^ = IB^ - (2B)VlO 
= 14,178 - 10,758.4 = 23,173 - 21,436.9 

= 3419.6 = 1736.1 

Assuming that the two populations have the same variance, it seems 
reasonable to use the variation from both samples to estimate the 
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common variance. This estimate is given by pooling (or adding) the sum 
of squares of deviations about the sample mean and dividing by the 
pooled (or added) degrees of freedom. That is, 

, pooled SS 

= - 

pooled df 
3419.6 + 1736.1 

“ 9T9 

= 286.4278 

Now the variances of the sample means are estimated by s^/n^ and s^/Ug. 
Because the variance of a difference of means (as well as the sum) is the 
sum of variances, the variance of the mean difference A - B is estimated 
by 


S^A-B = S^iVnA + V^b) 


= (286.4278)(1/10 + 1/10) 

= 57.2856 

Then the standard deviation for A — B is simply the square root of the 
estimated variance. This gives 

Sa-b — 57.2856 
= 7.57 

Finally, the t to test the hypothesis of no treatment difference 
Hq: Ha- Hu = 0 versus the alternative of a treatment difference 

Ha is simply 

T-B-0 

t -- 

^A-B 

= -13.5/7.57 
= -1.78 

If the hypothesized difference had been something other than zero (say 
3), then 3 and not zero would have been subtracted from A — B. The na 
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step in this test of significance is to calculate the probability of a t larger 
than this (in absolute magnitude). This probability or significance level, 

SL, is given by 

SL = P(UI> 1.78) 

which, from Table A.5, is about .10. Thus the difference of 13.5 is a little 
unlikely if the population means were equal, but it is not compellingly in 
favor of unequal means, as we might have surmised. 

We can graph values of SL versus various hypothesized values of 
- Pb form a more complete picture of tenable values for the 
population mean difference. From such a graph we would find that all 
values of — Hb between 

^ - B - 2.101s;i_B and ^ - B + 2.101s:i_B 


or 


— 29.4 and 2.4 

will produce a SL greater than .05. We might then say that all such 
values are in consonance with (in agreement with) the data at the .95 level. 
Such an interval is called a consonance interval by Kempthorne and 
Folks [1]. Here 2.101 is simply the tabulated t value with 18 degrees of 
freedom such that the probability of a larger value in absolute magnitude 
is .05. 


.4 Next we will illustrate the analysis using a paired design. We will not give 
as much discussion at each step. The idea of a significance test is the 
same: we calculate a t value and then determine the probability of a 
larger value in absolute magnitude. In Table 29.2 we give results that 
might have been obtained under columns A and B. 

Remember that one of the reasons for doing a paired experiment is to 
get a precise comparison of treatments within each pair. The first step is 
then to take the difference between relief time for each pair. In the 
example we have calculated D = B — A. In fact, we could have 
calculated A — B for each pair. It does not matter which we do, as long as 
we do the same for all pairs. The model we now use as basis for further 
analysis is that the 10 differences (Ds) comprise a random sample from a 
normal population with mean Hd = variance We want 

to perform a test of significance for H pi fip = 0 versus Ho=^0. 
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Table 29.2 Data from Paired Experiment 


Pair 

A 

B 

D = B-A 


1 

36 

37 

1 

1 

2 

44 

48 

4 

16 

3 

61 

72 

11 

121 

4 

26 

38 

12 

144 

5 

32 

38 

6 

36 

6 

33 

30 

-3 

9 

7 

45 

58 

13 

169 

8 

42 

48 

6 

36 

9 

56 

63 

7 

49 

10 

53 

58 

5 

25 

Total 



62 

606 


The variance is estimated by = Z(D — I>)V(n — 1), where n is 
the number of pairs. Now 

Z(Z) - Df = - (ZD)VlO 

= 606 - 384.4 
= 221.6 


so 


and 


Sd^ = 221.6/9 
= 24.6222 


So = 4.96 

The variance of D is estimated by 

= So^/n 

= 2.4622 


s-o = ^2.4622 
= 1.5691 


with 
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The t value is then given by 

D-0 

t = - 

6.2 

“ 1.5691 
= 3.951 

(For Ho ', /lo = c we would subtract c.) From Table A.5 we determine 
(using 9 df) that 

SL = P(U1> 3.951) = .01 
A (.95) consonance interval is given by 

D ± 2.262s 7, 
or 

6.2 ± 3.55 
or 

2.65 and 9.75 

Thus values of /lo between 2.65 and 9.75 would yield significance levels 
greater than .05. 


29.5 A striking illustration of a two-group experiment is given by data from 
part of an experiment performed by Wakeman and Kaplan [5]. Burn 
patients were assigned to one of two groups: medication only and 
medication under hypnosis. The percent of allowable medication was 
recorded for each patient. For four patients receiving hypnosis the 
average percent was 40. For five patients receiving medication only the 
average percent was 80. Thus the patients receiving hypnosis required a 
much lower level of medication. 


29.6 The results of an interesting paired experiment are reported by Sackeim, 
Gur, and Saucy [4]. Pictures of human faces displaying dilTcrent 
emotions were split down the middle and left-side and right-side 
composites were constructed. The right-side composite was formed by 
taking the right side of the original face and forming a mirror image to 
form the left side. Similarly, the left-side composite was a symmetric face 
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with the right side being the mirror image of the left side of the original 
face. Subjects were then shown slides of left- and right-side composites 
and asked to rate on a 1 to 7 scale the intensity of expression. The 
conclusions of the experiment are indicated by the “Emotions Are 
Expressed More Intensely on the Left Side of the Face.” 


SUMMARY. Two of the simplest experimental designs are the paired 
and two-group designs. The paired design assigns two treatments to 
experimental units at random within each pair of experimental units. If 
there are n pairs, there are n independent random assignments. The two- 
group design assigns one treatment to some of the experimental units 
chosen at random and the other treatment to the other experimental 
units. Thus, with the two-group experiment, there is only one random 
assignment. 

The paired experiment results in n pairs of observations, and the 
analysis is performed on the n differences. The model for the analysis is 
that the n differences are a random sample from a single normal 
distribution. 

The two-group experiment results in two samples of observations, and 
the model for the analysis usually is that we have independent random 
samples from two normal populations, with equal variances but different 
means, possibly. 

Several examples are given of both types of experiments. 
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EXERCISES 

1. Verify the calculations in Section 29.3. 


2. Verify the calculations in Section 29.4. 


There is often confusion among 

beginning students of statistics 

about whether two groups of data should be regarded as paired or 

unpaired. From the viewpoint put forward in this lecture, how can 
we tell? What is the essential difference in randomization between a 

paired experiment and a two-group experiment? 

Wakeman and Kaplan do not give the basic data in their paper, only 
the averages. Suppose that the percents of allowable medication 

requested by nine patients are as 

follows. 

Medication 

Hypnosis/ 

Only 

Medication 

85 

26 

92 

46 

76 

37 

81 

51 

66 



Calculate the t statistic to compare means and determine the 
observed significance level for the null hypothesis that /<, = /i^ 


versus the alternative hypothesis that /i, ^ fii- 

Sackheim, Gur, and Saucy do not give their raw data; they report 
only the conclusions. Suppose that we submitted 10 each of left- and 
right-side composites to a panel of eight judges. Each judge rated 
each composite on a 1 to 7 scale. Then 10 pairs of average ratings are 
as follows. 

Right-Side 

Left-Side 

Composite 

Composite 

3.375 

3.625 

4.750 

4.875 

5.125 

5.250 

6.250 

6.125 

3.750 

4.125 

4.875 

5.250 

5.125 

5.875 

6.250 

6.625 

5.875 

6.250 

5.250 

6.375 
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Calculate the t statistic for comparing means. Determine the 
observed significance level for testing Hq\ = versus 

^ l - f^R f^L- 

6. Cub scouts build and race small model cars in a derby. At a local 
elimination meet, the race track had two lanes. The boys felt that the 
left lane was faster than the right lane, so some trial runs were made 
with each car running in both the left and right lanes. The running 
times were clocked with a stopwatch, and the times are as follows. 


Car 

Left 


Right 


1 

5.1 


5.3 


2 

5.6 


5.4 


3 

4.9 


5.2 


4 

4.8 


5.2 


5 

5.3 


5.4 


6 

5.6 


5.5 


7 

5.6 


5.6 


8 

5.3 


5.2 


9 

5.1 


5.3 


Calculate the t statistic for comparing 

means. 

Determine the 

observed significance level for testing Hq : 

Al = 

versus //i: 






Each member of an 

18-member taste panel is presented with two 

pieces of sirloin steak 

in random order. One of the pieces has been 

aged for 6 months before cooking; the 

Other is 

from recently 

slaughtered beef. Each member of the panel rates the two pieces of 

beef on a five-point scale. The ratings follow. 


Aged Fresh 

Aged 

Fresh 

Aged Fresh 

4 3 

5 

1 

3 

3 

4 2 

5 

2 

4 

3 

5 2 

5 

3 

4 

2 

4 4 

5 

4 

2 

2 

5 3 

3 

2 

3 

1 

3 1 

4 

2 

4 

3 


Test the hypothesis that there is no difference between the mean 
ratings versus the hypothesis that there is. Use a = .05. 

8. A restaurant is considering two different brands of casual china. One 
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concern is how well the coffee cups hold the heat. A two-group 
experiment is designed to compare the heat losses from the two 
brands of china cups. The statistics from the experiment are: 



n 

X 

SS 

Brand A 

20 

18.2 

82.62 

Brand B 

20 

22.3 

78.96 


Test Ho’. iiA = versus \ Hb- ^se a = .01. 

9. A researcher in business administration is concerned with the effects 
on office efficiency of the cocktail for lunch. A two-group experiment 
is used. Some typists selected at random from a group of volunteers 
are given 1 ounce of alcohol during lunch. Others are given 2 ounces. 
During the first hour after lunch, the typing speed and error rate is 
recorded for each participating typist. Calculate the observed 
significance level for the hypothesis of no difference in mean typing 
speeds and in error rates. 



Typing Speed 

Error Rate 


n X 

SS 

X 

SS 

One ounce 

12 55 

24,122 

5.2 

76.41 

Two ounces 

14 62 

26,304 

7.4 

82.33 


10. In a large department each instructor is assigned one lower-division 
course and one upper-division course. At the end of the term, student 
evaluations are administered for ail courses. This results in a pair of 
overall ratings for each instructor. Test the null hypothesis that there 
is no difference between student evaluation ratings for upper- and 
lower-division courses. Use a = .05. 


Upper 

Lower 

3.21 

2.93 

3.41 

3.12 

2.98 

2.86 

3.35 

2.68 

3.67 

3.25 

3.24 

3.01 

3.13 

2.94 




COMPLETELY RANDOMIZED DESIGN. Wisconsin fields. USDA 











30 


COMPLETELY 
RANDOMIZED AND 
RANDOMIZED 
BLOCK DESIGNS 


30.1 In this Icclurc we illustrate two very simple (but much used) designs; the 
completely randomized (or completely random) and the randomized 
block designs. As in the last lecture, we will illustrate the analysis using 
small sets of artiheial data in order to keep the arithmetic simple. After a 
discussion of the planning of the experiment, we will illustrate the 
arithmetic required to perform an analysis of variance and a subsequent 
F test of significance. 


30.2 Appreciation for the design of experiments may be more easily acquired 
if one is actually involved in a research situation or if one is well 
acquainted with the research of others. Lectures 28,29, and 30 may seem 
rather abstract to a student who has never seen or who has never been 
involved with a research project. Therefore it might be appropriate to 
compare the effect on student final scores, in a course in statistics, of 
three different methods of teaching. 

1. Method A. Regular statistics lectures supplemented with handout 
materials describing real examples from current research. 

2. Method B. Same as method A with the additional requirement that 
the students design, conduct, and analyze data from experiments of 
their own. 

3. Method C. Regular statistics lectures with no other special 
requirements. 

The difficulties inherent in conducting a meaningful experiment of this 
type are immense. Yet many such experiments have been conducted by 
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researchers in education; e.g., experiments have been conducted to 

compare the “new math” curriculum materials with traditional 
materials. 

We will suppose that we have located 12 classes of students for which 
the instructors have agreed to participate in our experiment, and we are 
considering whether to use a completely randomized design or a 
randomized block design. A completely randomized design would 
consist of assigning method A to some classes chosen at random from the 
12, method B to some classes chosen at random, and method C to the 
other classes. A randomized block design, on the other hand, would 
consist of grouping (blocking) the classes into four groups (blocks) of 
classes and, within each block, assigning the three methods of teaching to 
the classes at random. In terms explained in Lecture 28, the methods of 
teaching are the treatments and the classes of students are the 
experimental units. Let us give explicit descriptions of the two designs. 


Completely Randomized Design. We have n experimental units and t 
treatments. Treatment 1 is assigned to n, units chosen randomly, 
treatment 2 to nj units chosen randomly, etc. The number of replications 
(nj, n 2 ,...) may or may not be equal. 

Randomized Block Design. We have bt experimental units grouped 
into b blocks of t units each and t treatments. Within each block, the 
treatments are randomly assigned to the experimental units. 


30.3 We will now suppose that we have decided to use a completely 
randomized design and that we have obtained average class scores for 
each of the 12 classes, as shown in Table 30.1 under the columns headed 
A, B, and C. 


Table 30.1 Data from Completely Randomized Experiment 


A 

A^ 

B 


C 


79 

6,241 

83 

6,889 

67 

4,489 

81 

6,561 

86 

7,396 

72 

5,184 

86 

7,396 

91 

8,281 

74 

5,476 

77 

5,929 

84 

7,056 

68 

4,624 

Total 323 

26M7 

344 

29,622 

281 

19,773 
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As a model for the analysis that follows, we will regard the data arising 
from three methods as being random samples from three different 
normal populations with means //,, //«, and /i^ with a common variance 
(T-. The analysis of variance that we give in Table 30.2 is the same as was 
illustrated in Lecture 28 for a one-way classification, although we will 
obtain the sums of squares in slightly different fashion. 

The correction factor (CF) is given by (grand total)“/n. That is. 


CF = (948)2/12 = 74,892 


The total sum of squares is obtained by the sum of squares of every 
observation minus the correction factor. The between groups and within 
groups sums of squares are obtained as shown. 


Total SS = 75,522 - 74,892 


= 630 


u . cc (323)2 (344)2 (281)2 

Between SS =- -I--- -I- -CF 


= 75,406.5 - 74,892 
= 514.5 

Within SS = total SS — between SS 
= 115.5 


Because the between sum of squares is between groups created by 
different treatments we will call it the treatment sum of squares. We call 
the within sum of squares the error sum of squares because the within 
mean square is an estimate of experimental error variance, which we 
described in Lecture 28. 


Table 30.2 Completely Randomized Analysis of Variance 


Source 

df 

SS 

MS 

f 

Total 

11 

630 



Treatments 

2 

514.5 

257.250 

20.046 

Error 

9 

115.5 

12.833 
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To test the hypothesis that there are no clifTerences among the treatment 
effects, we use the F ratio. 

_ treatment MS 
error MS 

= 20.046 

Next we calculate the probability of a larger F. Using Table A.7 with 2 
degrees of freedom for the numerator and 9 degrees of freedom for the 
denominator, we have 

SL = P{F > 20.046) 

= .001 

Using the signiffcance level as an index of credibility for the hypothesis of 
equal treatment effects, we would likely conclude that the treatment 
effects are different. In fact, the sample mean for treatment B is the largest 
of the three and we would be inclined to accept method B as the most 
effective way of teaching statistics. 


30.4 If, in fact, we have some reason for grouping experimental units (classes 
in the current example) together as blocks we should consider doing so 
and then use a randomized block design. For example, w'e might have the 
situation that the 12 classes are taught by four instructors, each 
instructor teaching three classes. This groups the classes into four blocks 
with three classes each and, within each block, we would randomly 
assign treatments to classes. The average class scores might be as in 
Table 30.3. 

The analysis of variance for the randomized block design was worked 
out by Fisher in the early 1920s and was given in a footnote to a paper of 
“Student’s” in 1923 [1]. It partitions the total variation into variation 


Table 30.3 Data for Randomized Block Design 


Blocks 



Treatments 



Total 

A 


B 


C 


1 

74 

5.476 

79 

6,241 

63 

3.969 

216 

2 

76 

5,776 

83 

6,889 

75 

5,625 

234 

3 

84 

7,056 

89 

7,921 

67 

4,489 

240 

4 

79 

6,241 

94 

8,836 

76 

5,776 

249 

Total 

313 

24,549 

345 

29,887 

281 

19,859 

939 
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due to blocks, treatments, and experimental error. The compulation of 
these sums of squares proceeds as follows. 


CF 


(939)- 

12 


= 73,476.75 

Total SS = 74,295 - CF 
= 818.25 


Blocks SS = 


(216)- (234)"= (240)- (249)- 


3 


3 


3 


- CF 


= 73,671 - CF 


= 194.25 


Treatments SS = 


(313)- ^ (345)- ^ (281)- 


-CF 


= 73,988.75 - CF 

= 512.00 

Error SS = total SS — block SS — treatment SS 

= 112.00 


We have avoided giving algebraic formulas, hoping to convey the 
method of calculation by example. One characteristic of analysis of 
variance calculations is that when a number is squared, it is almost 
always divided by the number of basic observations going into it. An 
inspection of the foregoing calculations will show this happens several 
times. 

Finally, we assemble our calculations, including an F value and a 
significance level, as in Table 30.4. Again, because of the small value of 
the signilicance level, we would be inclined to feel that the teaching 
methods do have different effects. 


Table 30.4 Analysis of Variance—Randomized Block 


Source 

df 

SS 

MS 

F 

SL 

Total 

11 

818.25 




Blocks 

3 

194.25 




Treatments 

2 

512.00 

256.00 

13.71 

.005 

Error 

6 

112.00 

18.67 
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30.5 In the last three lectures we have indicated the power of statistics in 
experimental science. We have restricted ourselves to elementary 
examples and have avoided mathematical notation as much as possible. 
This will afford the students a glimpse of the statistical design of 
experiments. 


SUMMARY. The completely randomized design and the randomized 
block designs are extensions of the two-group and paired designs, 
respectively, to experiments with more than two treatments. With the 
completely randomized design, treatment 1 is assigned to some units 
chosen randomly, treatment 2 to some units chosen randomly, etc. With 
the randomized block design, the treatments are assigned at random to 
the units within each block. With the completely randomized design the 
sources of variation are total, treatments, and error. With the random¬ 
ized block design the sources of variation are total, blocks, treatments, 
and error. 

With either design the F ratio for treatments is formed by dividing the 
treatment mean square by the error mean square. The observed signifi¬ 
cance level is the probability of getting a larger F by chance when there 
are no differences among the treatment effects. 
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EXERCISES 

1. Verify the analysis of variance in Table 30.2 and the resulting 
significance level. 

2. Verify the analysis of variance in Table 30.4 and the resulting 
significance level. 

3. A director of single-student housing was concerned with the effect on 
student scholarship of housing policies and conducted a randomized 
block experiment. Twenty-four students, having given their permis- 
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sion to random dormitory assignment, were blocked into four blocks 
of six students each. The blocking was done on the basis of previous 
grade point averages—the students within any block having similar 
scholastic records. The treatments were six dormitories with dis¬ 
tinctly different rules and regulations. Within each block of students, 
the students were randomly assigned to these six dormitories. At the 
end of the year, the grade point averages were as follows. 


Blocks 

Treatments 

1 

2 

3 

4 

1 

3.45 

3.68 

2.25 

2.68 

2 

2.98 

3.35 

2.10 

2.45 

3 

2.93 

3.17 

2.32 

2.36 

4 

3.23 

3.52 

2.26 

2.39 

5 

3.17 

3.95 

2.47 

2.78 

6 

3.01 

3.78 

2.34 

2.64 


Calculate an analysis of variance table similar to Table 30.4 and 
calculate a significance level to test the hypothesis of equal treatment 
effects. Given these data, do you think dormitory assignment affects 
grades? What is your opinion about the quality of the experiment? 

4. A distinguished author has coauthored books with four different 

people. It is of interest to study whether the writing style is markedly 

different in the four books. One small study concentrated on the 

frequency of the use of prepositions. Passages were selected from each 

of the books, and the number of prepositions in 20-line sections were 
recorded. 


Book 


1 

2 

3 

4 

28 

29 

26 

39 

31 

33 

24 

27 

17 

33 

22 

35 

25 

35 

19 

34 

26 

24 

23 

28 

22 

28 

25 

34 

24 


29 

33 



30 
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a. Calculate the analysis of variance and F statistic to test equality of 
the mean number of prepositions. 

b. Calculate the observed significance level. 

c. Identify the treatments and experimental units. 

5. A grocer is experimenting with different display schemes in the soft 
drink sections of several stores. He designs a randomized block design 
to compare the effects of three different schemes on the total weekly 
sales volume (dollars) in soft drinks. In each store each display scheme 
is tried for one week, the order being decided at random in each store. 


Scheme 


Store 

A 

B 

C 

1 

9250 

8670 

7675 

2 

1125 

1075 

1250 

3 

2420 

2780 

2525 

4 

3465 

3615 

3625 

5 

4213 

3860 

3855 


a. Calculate the analysis of variance and F statistic to test equality of 
mean sales volume for the three schemes. 

b. Calculate the observed significance level. 

c. Identify the treatments and experimental units. 

6. A student-faculty committee investigating the effectiveness of student 
evaluation forms designed a randomized block experiment. Three 
different versions of the evaluation form were designed. On form A all 
statements were positively worded. On form B all statements were 
negatively worded. Form C used some negative statements and some 
positive statements. Three departments from each of four different 
colleges participated in the study. Within each college the forms were 
randomly assigned to departments. The average departmental faculty 
ratings are given here. 


Form 


College 


1 

2 

3 

4 

A 

2.65 

2.58 

2.76 

2.76 

B 

2.91 

2.74 

2.87 

3.01 

C 

2.72 

2.61 

2.53 

2.82 


a. Calculate the analysis of variance and F statistic to test equality of 
mean ratings for the three forms. 
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b. Calculate the observed significance level. 

c. Identify the treatments and experimental units. 

7. Students from a psychology class volunteered to participate in a 
reading experiment. The time required to read a word list was 
recorded. Three types of word lists were prepared, distinguished from 
each other by the number of words repeated at least twice. The word 
lists were assigned at random to students, and the reading times (in 
seconds) were as shown. 


Word Lists 

1 

2 

3 

10.1 

14.2 

15.2 

10.2 

15.1 

12.9 

11.5 

16.2 

14.3 

15.3 

13.7 

14.8 

14.6 

13.0 

15.0 

13.1 

12.6 

14.7 

13.9 

14.3 

13.1 


a. Calculate the analysis of variance and F statistic to test for equality 
of mean reading times. 

b. Calculate the observed significance level. 

c. Identify the treatments and experimental units. 

8. The trainer for a university football team designs a randomized block 
design to compare the effect of different breakfasts on running speed 
of the players. The breakfasts are cereal, ham and eggs, toast and rolls, 
and pancakes. The blocks consist of four players each from the 
offensive backs, interior linemen, pass receivers, and defensive backs. 
An hour after breakfast the players run a lOO-yard dash, and the times 
are recorded. Calculate the analysis of variance and test the 
hypothesis that there is no difference among breakfasts on the speed 
of the players. 




Blocks 


1 

2 

3 

4 

Cereal 

9.8 

10.2 

10.1 

10.0 

Ham and eggs 

10.2 

10.8 

10.2 

10.5 

Toast and rolls 

10.0 

10.2 

10.4 

10.2 

Pancakes 

9.7 

10.8 

10.2 

10.3 
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9. The staff of a university newspaper is experimenting with different 
formats. A completely randomized experiment is designed to com¬ 
pare three different formats. Volunteer students in a journalism class 
are given one of three formats and, after reading the news stories, take 
a reading comprehension test. This gives the following statistics. 


Format 

n 

X 


1 

10 

90 

4.37 

2 

10 

86 

3.76 

3 

11 

83 

4.21 


Perform the analysis of variance, calculate the F statistic, and 
determine the significance level for the hypothesis that the mean 
reading comprehension is the same for all formats. 


IMPORTANT 
CONFLICTS OF 

IDEAS 



t BAYES FAMILY VAULT. Grave of Thomas Bayes in Bunhill Fields 

urial ground for many notable nonconformists. Reprinted with permission o 
Stephen Stigler. 
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31.1 


31.2 


THE GREAT 
BAYESIAN 
CONTROVERSY 


A\boul 2 iiiile north of Gresham College, where Karl Pearson gave his 
lectures on probability and statistics, lies Bunhill Fields. Across City 
Road from the chapel and home of John Wesley, Bunhill Fields, burial 
ground for 120,000 souls, is the famous cemetery of the nonconformists, 
disused since 1852. Here are the graves of John Bunyan, Daniel Defoe, 
Isaac Watts, and Susannah Wesley, among many other famous people. 
Here also are the graves of Richard Price and Thomas Bayes. 

Readers of current statistics textbooks and journals may search in 
vain for the names of Fisher and Pearson but may see repeated references 
to Bayes. Beginning students might well conclude that Bayes, not 
Pearson, was the founder of modern statistics! In fact, Bayes made 
almost no contribution to statistics. Why, then, the repeated references 
to Bayes? The answer seems to lie in a paper by Bayes [1] that has 
become the focal point for the mode of reasoning called Bayesian 
statistics. This paper, published in 1763, was submitted after Bayes' 
death by his friend, Richard Price, who was known for the development 
of life insurance principles. The paper contained what has become 
known as Bayes’ theorem, which consists of nothing more than an 
elaboration of the rules for conditional probability given in Lecture 8. To 
understand why so much attention has been given to Bayes' theorem, it 
is necessary to remind ourselves of ways of thinking about probability. 


In Lecture 7 we discussed the interpretation of probability and 
emphasized that there are essentially two ways of conceiving of 
probability: (1) as a physical properly inherent to a physical system, and 
(2) as a measure of belief in the truth of some statement. Hacking [4] also 
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identifies these two major conceptions of probability. From the time of 
Pearson’s first lectures in probability and statistics until the late 1950s, 
the overwhelming majority of statisticians held the first view of 
probability with the probability of an event being the relative frequency 
with which the event occurred or might occur. During this time, Bayes’ 
theorem was used only in situations where probability could be 
interpreted within the frequency framework. Its use in statistics was 
almost nonexistent. Rethinking on the meaning of probability has led to 
a tide of papers in statistics in which probability is regarded as a measure 
of belief. The recent surge of publications in Bayesian statistics started in 
1960. Virtually all of about 175 references in Lindley’s 1970 bibliography 
on Bayesian statistics [5] are to papers written in the 1960s. 

Since 1960 the case for and against Bayesian statistics has been stated 
and argued with all of the passion and heat that marked the Pearson- 
Fisher quarrel. Always the argument seems to hinge on how we interpret 
probability. Let us now try to illustrate why this is crucial for the use of 
Bayes’ theorem. 


31.3 Take a simple urn problem of the type so useful in statistics. We are 
presented with an urn known to be either urn I or urn II. The contents of 
the urns are as shown: 


Urn I. 4 red balls and 1 white ball 
Urn II. 1 red ball and 4 white balls 

We draw one ball at random from the urn presented to us and obtain a 
red ball. What is the probability that we have been presented with urn I, 
given that we obtained a red ball ? Using the rules given in Lecture 8, we 
would proceed as follows. 

o/f 1 .4X 

P(red I I)P(1) 

P(red) 

Now a red ball can occur in two mutually exclusive ways: with urnT or 
with urn II. Therefore, 

P(red) = P(I and red) + P(II and red) 

= P(red|I)P(I) + P(red|II)P(II) 
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so, finally, we have 


P(I I red) = 


P(red 11)P(I) 

P(red|I)P(I) + P(red|lI)P(II) 


(4/5)P(l) 

(4/5)P(I) + (1/5)P(II) 


In order to get a numerical answer we need P(I) and P(II), but these 
have not been given in the statement of the problem. The frequentist 
point of view is that there is no way to proceed further in the calculation 
of the desired probability. The view may be that the state of the urn 
(either urn I or urn II) is completely fixed and, since there is no 
randomness, there is no way of discussing P(I) or P(II). On the other 
hand, even if we know that the choice of urn was made by a random 
process, we do not know the probabilities associated with urn I and urn 
II and can proceed no further. 

The Bayesian point of view goes something like the following. 
Probability is a measure of our belief, and we can always express our 
degree of belief in terms of probability. Complete ignorance about 
whether the urn was I or II would be expressed with probabilities of 1/2 
for both. Using these probabilities, we would obtain 


P(I I red) = 


(4/5)(l/2) 

(4/5)(l/2) + (l/5)(l/2) 



On the other hand, suppose we strongly believed that the turn was urn II. 
We might use P(I) = 1/10 and P(II) = 9/10. Then we would have 


P(I I red) = 


(4/5)(I/I0) 

(4/5)(l/10) + (l/5){9/10) 
4 


4 + 9 


= 4/13 

Continuing to think of probability as a measure of belief, look at these 
calculations. The occurrence of a red ball should increase the probability 
of urn I. This it has done in both cases; in the first case from P(I) = 1/2 to 
P(11 red) = 4/5 and in the second from P(I) = 1/10 to P(I | red) = 4/13. 
Incidentally, the idea of using equal probabilities to represent 
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complete ignorance was used by Bayes in the 1763 paper and is called the 
Bayes postulate. It is also known among other names, such as Laplace's 
principle of insufficient reason. 


31.4 We will now use the simple urn example as a base for describing 
Bayesian statistics. We can think of the urn number as a parameter, and 
our data consist of the color of the ball drawn. Before we draw the ball 
and observe its color, we have probabilities about the parameter value 
[P(I) and P(II)]. The probabilities are called prior probabilities because 
they exist before or prior to our collecting data. After we collect our data, 
it seems reasonable to calculate new probabilities, which are called 
posterior probabilities, or after-the-fact probabilities. These posterior 
probabilities are, using Bayes’ theorem, the conditional probabilities of 
the parameter value given the data. 

Now consider the problem of making inferences about the mean of a 
normal population on the basis of a sample from that normal 
population. Statistics based on a frequency interpretation of probability 
treats as a fixed parameter for which no probability statements can be 
made, either before or after the sample. Inferences about n are based on 
estimates of /i, calculation of significance levels, consonance intervals, 
etc. However, the Bayesian approach is to use prior probabilities for n 
and Bayes’ theorem to obtain posterior probabilities for 


31.5 It is still not clear how the controversy is going to be resolved. Prior to 
1960 virtually all statistics avoided the Bayesian approach. Fisher had 
steadfastly avoided the use of Bayes’ theorem and, in his last book [3], 
strongly rejected it once more. Although Pearson had on occasion used 
Bayes’ theorem, he had, in the main, avoided it. Neyman and Pearson 
had decided against its use in their landmark paper on testing 
hypotheses. However, Bayesian statistics has attracted many followers 
since 1960, and further development may enhance its usefulness. The 
case for Bayesian statistics is forcefully stated by Lindley [5]. For a 
presentation of arguments against Bayesian statistics, refer to the paper 

by Berkson [2]. 


SUMMARY. Bayes wrote a paper that was published after his death by 
his friend Richard Price. This paper, published in 1763, contained a 
theorem that has become known as Bayes’ theorem. It allows the 
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calculation of one conditional probability given another conditional 
probability and certain marginal probabilities. The usefulness of the 
theorem hinges on the interpretation of probability. 

If a probability allows two possible values, and for ^ parameter 

0 , 


P{0i I data) = 


P(&dVd\0^)P(0i) 

P(ddVd\0^)P(0^) + P(&d{d\02)P(02) 


In order to use the formula, P{.0i) and P(02) must be known. The 
Bayesian view of probability is as a measure of belief, so and P(l) 2 ) 
are considered as measures of belief prior to collecting the sample. After 
collecting the data, these probabilities are modified by calculating the 
conditional probabilities P(0y \ data) and P(02 \ data). These latter prob¬ 
abilities are called posterior probabilities. 
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THE 

FISHER-PEARSON 

CONTROVERSY 


32.1 Karl Pearson was born in London in 1857. Before his death in 1936 he 
was to author more than 600 articles and books covering subjects from 
the history of art to theoretical statistics. He must be regarded as 
the founder of modern statistics. A summary of his life is given by his 
son, E. S. Pearson [4]. For other articles see Walker [8], StoulTer [7], 
Wilks [9], and Haldane [2]. 

What was to become a long association with University College 
London began when he was sent to University College School in 1866. 
He was withdrawn by his parents for health reasons and placed under a 
private tutor at Hitchin, about 30 miles north of London, in 1873. A year 
later he went to Cambridge and worked under another tutor, as was 
customary. In 1875 he was awarded a scholarship at King's College, 
Cambridge, and took his degree with mathematical honors in 1879. 

Supported by a King’s College Fellowship, the next several years were 
for Pearson devoted to an incredible program of work, study, and travel. 
He studied and wrote in Germany, and in London he studied law and 
was called to the bar in 1881. He taught some classes at King's College, 
London, and at University College London. In 1884 he was appointed 
to the Chair of Applied Mathematics and Mechanics at University 
College London. During the next several years he worked hard at 
teaching mathematics to engineering students and at lecturing and 
writing on widely ranging subjects. 


o 

In London there is still a building at the corner of Gresham and 
Basinghall streets with a worn sign that says Gresham College. Sir 
homas Gresham (1519-1579), English financier and member of Queen 
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Fli/abeih's privy council, was ihc founder of the Royal Fxchange. He 
invested the rents from 100 shops in the Royal lixchange for the purpose 
ol establishing a college in which seven professors were to lecture on 
“divynitye. astronomy, musick. geometry, law', physicke. and rethor- 
icke.” hrom 1579 until 1768 the college was housed in Gresham's old 
home, a medieval manor house. Thereafter, lectures were delivered at the 
Royal Exchange until 1838. A new college building was erected at the 
present location in 1842; the present building, housing the Graduate 
BusinessCentreof theCity of London University, dales from 1913. Inthc 
year that R. A. Fisher was born. Pearson was appointed as the nineteenth 
lecturer in geometry at Gresham College on December 15. 1890. He held 
this appointment concurrently with his chair at University College until 
he resigned the Gresham College lectureship on June 12, 1894. At 
Gresham College he gave a series of popular lectures on probability and 
statistics; syllabuses ofthese lectures are included in the volume by E. S. 
Pearson [4]. The modern age of statistics had begun. In these lectures in 
1893 he first used the expression standard deviation and began referring to 
the probability law of errors as the normal distribution. 

Following the Gresham College lectures, Pearson gave at University 
College from 1894 to 1895 and 1895 to 1896 the first university courses on 
the theory of statistics. An outline of these courses is included by E. S. 
Pearson [4]. 

Karl Pearson is called the founder of biometrics, and his laboratories 
at University College are often referenced. Modern visitors to University 
College may search in vain for traces of the Biometrics Laboratory, 
although the Galton Laboratory still exists in fact and name in the 
Department of Human Genetics and Biometry. Pearson himself traced 
the beginnings of the Biometrics Laboratory to 1895. With Francis 
Galton, Pearson founded the journal Biometrika\ the first volume 
appeared in 1901. In 1907 he accepted supervision of the Galton 
Labt)ratory, founded 3 years earlier by Francis Galton. Upon the death 
of F rancis Gallon in 1911, the Galton Chair of Eugenics was established 
at the bequest of Galton, and Pearson gave up his chair of applied 
mathematics to accept the Gallon chair and to establish a new' 
department, the Department of Applied Statistics. In 1912 Fisher 
published his lirsl paper. 

In 1911 money was given by Sir Herbert H. Bartlett to construct the 
building on Gower Street to house the new department. The building 
was completed in 1914 but, because of World War I, was not fully 
occupied until 1920. The statistics program at University College is still 

housed there. 
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In 1925, at 68 years of age and with a brilliant career already behind 
him, Pearson launched another major activity with the founding of the 
Annals of a journal devoted to the study of human genetics. In 

more recent years the name has been changed to the Annals of Human 
Genetics. 

In 1933, with much regret, Pearson resigned his chair, and his 
department was split. Fisher was appointed to the Galton chair to head 
the Department of Eugenics, and Pearson’s son, E. S. Pearson, was 
named to head the new Department of Applied Statistics. 

Pearson continued to work in his beloved profession until he died in 
1936. 


32.3 


An account of the life of Fisher is given by Kendall [3] and Yates and 
Mather [10]. The entire issue of Biometrics (June 1964) is devoted to 
Fisher, and the article by Pearson [6] provides a glimpse of Fisher at 
University College. Ronald A. Fisher was born in London in 1890. 
During the years that the foundations were being laid for biometry and 
modern statistics at University College, Fisher was receiving a good 
education at Stanmore Park School and Harrow School. He entered 
Cambridge University, where he specialized in mathematics, in 1909. His 
first scientific paper was published in 1912, the same year that he 
graduated from Cambridge, and 3 years later his famous paper giving 
the probability distribution of the correlation coefficient r appeared in 
Biometrika. Although Fisher subsequently submitted a few papers to 
Biometrika for consideration, he never published another paper in that 
journal. In 1919 he was offered a post at University College, but he 
accepted instead the job at Rothamsted (see Lecture 25). 

Until his death in 1962, he produced a flood of books and papers on 
many aspects of applied as well as theoretical statistics. The most famous 
statistics books are Statistical Methods for Research Workers (1925) and 
The Design oj Experiments (1935). His work brought him fame 
throughout the statistical world, and prizes and honors were heaped on 
him. He was knighted in 1952. 

We have already mentioned that Fisher succeeded Pearson in 1933 in 


the Galton chair at University College. In 1943 he returned to 
Cambridge to accept the Arthur Balfour Chair of Genetics. In 1957 he 
retired, but a few years later he accepted a research fellowship in 
Australia in the Division of Mathematical Statistics of the Common¬ 
wealth in Adelaide. He died there following an operation in 1962. 
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32.4 For many years the statistical world was enlivened by the fierce dispute 
that developed between Pearson and Fisher. There may be little point in 
trying to determine exactly how the dispute materialized. However, the 
fact of its existence is an important part of the story of statistics, and any 
students who read the works of Fisher and Pearson will become aware of 
this fact. Egon Pearson [5] gives some interesting clues to its origins that 
are contained in correspondence among Fisher, “Student,” and Pearson. 

The disagreements between Fisher and Pearson involved more than 
personalities; they involved different philosophies of statistics. The 
Pearsonian viewpoint, particularly as modified and developed by 
Pearson and Neyman, has lead to a view of statistics as a decision¬ 
making science, while the statistical heirs of Fisher have come to think of 
statistics more as a science of data analysis and formation of opinion. We 
will discuss these viewpoints further in Lecture 33. 


SUMMARY. Karl Pearson (1857-1937) and R. A. Fisher (1890-1962) 
were contemporaries and fellow pioneers in statistics during parts of 
their lives. Pearson had already established his fame and reputation 
before Fisher began his career. In 1884 Pearson was appointed to the 
Chair of Applied Mathematics and Mechanics at University College 
London. In 1901 he edited the first volume of Biometrika^ 2in<3 \n 1911 he 
established the Department of Applied Statistics at University College 
London. In 1912 Fisher graduated from Cambridge and published his 
first paper, and in 1915 he published his first statistical paper, on the 
distribution of the sample correlation coefficient. 

Between 1915 and 1919 serious differences, never to be breached, arose 
between Pearson and Fisher. From 1919 to 1933 Fisher was at 
Rothamsted establishing the foundations of experimental statistics. 
During this same period, Pearson’s department prospered at University 
College. Pearson retired in 1933 and died 3 years later. In 1933 Fisher 
became chairman of the Department of Eugenics at University College. 

Both men made enormous contributions to statistics. Yet their 
philosophies of science and of statistics were different. To some extent 
these different philosophies in statistics persist to this day. 
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INFERENCE 

VERSUS 

DECISION 


33.1 


In 1950 the book by Wald entitled Siaiisiical Decision Fund ions [3] 
introduced a new era in statistical thinking. Wald’s theory places the 
statistician or the user of statistics in a decision-making situation. Data 
are to be analyzed and used to reach one of a possible set of decisions. 

This formulation, which is consistent with the Neyman-Pearson 
theory of testing hypothesis (accept or H has been accepted by 
many statisticians as the fundamental way to describe all statistic.s. This 
leads to the description of statistics as the science of decision making 
under uncertainty. Wald apparently visualized his w'ork as a unifying 
theory of statistics. In the preface he states: 

A major advance beyond previous resulis is the treatment of the desic/n of 
experimentation as a part of the i/eneral decision problem* 


Fisher reacted strongly to the suggestion that Wald’s decision theory 
included the design of experiments. He maintained that although 
decision theory might be suitable for industry and technology, it was not 
appropriate for scientific work. In his last book [1, p. 100] he stated : 


it would still be true that the Natural Sciences can only be successfully 
conducted by responsible and independent thinkers applying their minds 
and their imaginations to the detailed interpretation of verifiable obser¬ 
vations. The idea that this responsibility can be delegated to a vast 
computer programmed with Decision Functions belongs to the phantasy of 
circles rather remote from scientific research.^ 


Reproduced by permission of ihe publishers. John Wiley & Sons, Inc., from Sfitfistimt 
Decision Functions by Abraham Wald. 1950. 

t Reproduced by permission of the publishers. Hafner Press. MacMillan Publishing Co., 
Inc., New York, from Statistical Methods and Scientific Inference by R A. Fisher. 
Copyright (P) 1973 by the University of Adelaide. 
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The process of learning from statistical analysis as opposed to 
reaching a decision has come to be known as statistical inference. The 
relationship of statistical inference to statistical theory is exposited 
clearly by Menges [2], who refers to the English Fisher school when 
discussing inference. This is probably appropriate, since much of the 
philosophy of inference statistics can be traced to the ideas of Fisher. On 
the other hand, the concept of statistics as a decision science is clearly 
more directly related to the Neyman-Pearson theory. 


33.2 In earlier lectures I introduced estimation, tests, and intervals, which are 
fundamental analyses in statistical inference. Let us now consider some 
simple decision problems and give some elements of modern decision 
theory. 

We are going to buy a car that is required for our work. We expect to 
keep the car for 2 years, during which time we expect to drive 40,000 
miles. We have narrowed the choice of cars to two: car A costing $5000 
and averaging 20 miles per gallon and car B costing $6700 and averaging 
40 miles per gallon. Which car should we choose if we are concerned only 
with the total of the cost of the car and the gasoline cost for the 2 years? 
There is no way to solve the problem without including the price of the 
gasoline. Consider working the problem with three possible average 
prices of gasoline: $1 per gallon, $2 per gallon, and $3 per gallon. The 
total costs for the 2-year period are given in Table 33.1. 

We now have the information on which to base a decision. If we think 
the average price of gasoline will be about $1, we should choose car A. If 
we think it will be about $2, we should choose car B. Without knowing 
what the price of gasoline will be, what decision should we make? 

One rule, called the minimax rule, is to consider the worst possible 
state of affairs for each possible decision and to choose the decision for 
which the worst state of affairs is best. That is, it considers the maximum 


Table 33.1 Total Cost—Car Decision Problem 




Prices 



$1 

$2 

$3 

Car A: $5,000 car 
using 2,000 gallons 

$7,000 

$9,000 

$11,000 

Car B: $6,700 car 
using 1,000 gallons 

$7,700 

$8,700 

$9,700 
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total cost for car A and for car B and chooses the car giving the minimum 
of the maximum costs (minimax rule). With car A, the maximum cost is 
$11,000 dollars; with car B, the maximum cost is $9700 dollars. So the 
minimax decision is car B. As Menges [2] states, this rule is not generally 
recommended by statisticians because it is too pessimistic. 

Another rule, called the Bayes rule, requires specification of prob¬ 
abilities for the prices of gasoline. Clearly, in this instance these 
probabilities are measures of belief. Suppose we feel rather strongly that 
the price of gasoline will remain low, so that we specify probabilities of 
and ^ for $ 1, $2, and S3 gasoline. Next we calculate the average cost for 
car A and car B. The Bayes rule is to choose the car for which the average 
cost is the smallest. We can easily verify that 

Ave(cost for car A) = (^)(7000) -I- (i)(9000) {i)(l 1,000) 

= 7750 

Ave(cost for car B) = 8075 

Then car A is the Bayes decision for the probabilities specified. Other 
choices of probabilities will lead to car B as the Bayes decision. 

33.3 The previous example illustrates many of the features of a decision 

problem. There is a parameter space (unknown price of gasoline), a 

decision space (car A or car B), and a cost function, or loss function (Table 

33.1). However, this example does not include data that give us some 

partial information about the parameter. Consider a second example for 

which the formulation includes data. We will give a simplified version of 

a problem that might occur in the sampling inspection of a manufactur¬ 
ing firm. 

When manufacturing lots are of low quality but pass inspection, it is 
felt that the loss in customer goodwill is considerable. Lots that do not 
pass inspection are reworked—the cost of good lots being slightly less 
than poorer ones. The possible situations are portrayed in Table 33.2. 



Table 33.2 

Losses 


Fraction Defective 


.05 

.10 .15 

Pass 

0 

1000 1000 

Rework 

300 

500 500 
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A sample of two items is taken from each lot, and x = number of 
defective items observed. Two possible decision rules are being 
considered. 


Rule /. Pass the lot if x = 0. 

Rule 2. Pass the lot ifx = 0 or 1. 

The probabilities of passing and reworking for both rules for each 
fraction defective are given in Table 33.3. By multiplying the pro¬ 
babilities by the losses and summing, we can obtain, for each 
fraction defective and for each decision rule, the average loss (risk). 
For e.xample, the risk for Rule 1 when the fraction defective is .10 
is given by 

Risk = (.90)2(1000) + [1 - (.90)2](500) 

= 905 

Proceeding in this way, we obtain the full set of risks given in Table 33.4. 

Rule I is the minimax rule. Let us determine the Bayes rule if our 
probabilities of .05, .10, and .15 are .90, .05, and .05, respectively. Using 
these probabilities, we obtain the expected risks. 

Rule 1. 114.6375 
Rule!. 99.8625 

So, for the probabilities given, the Bayes rule is Rule 2. 


Table 33.3 Probabilities of Pass and Rework 




Fraction Defective 




.05 

.10 

.15 


Pass 

Rework 

Rule 1 Rule 2 

(.95)* 1 - (.05)* 

1 - (.95)* (.05)* 

Rulel Rule 2 

(.90)* 1-(.10)* 

1 - (.90)* (.10)* 

Rule 1 
(.85)* 1 

1 - (-85)* 

Rule 2 
- (.15)* 
(.15)* 


Table 33.4 Risks 


Fraction Defective 


.05 .10 .15 


Rule 1 
Rule 2 


29.25 

0.75 


905 

995 


861.25 

988.75 
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33.4 It seems that the decision theory just introduced would have all of the 
necessary components to deal with parameter estimation, tests, and 
intervals. In fact, it seems that this is true only in a formal sense in 
scientific work. Researchers do not generally know what losses (or 
rewards) are likely to result from their work nor do they have a clearly 
defined set of decisions or probabilities for the possible parameter values. 
For work requiring business decisions, on the other hand, decision theory 
seems admirably suited. 


SUMMARY. Wald’s book. Statistical Decision Functions, was an 
important landmark in the development of statistics as a decision¬ 
making science. Fisher resisted attempts to systematize statistics to such 
an extent, apparently preferring to regard statistics more in terms of data 
analysis and formation of opinion. Learning from data instead of 
making decisions is called statistical inference. 

Statistical decision theory is formulated as follows. A person has 
several decisions open to him or her in the face of several unknown states 
of nature. For each action-state pair there is a loss (possibly negative) to 
the decision maker. One of the possible decisions is made on the basis of 
sample data. The expected value of the loss over possible samples is 
called the risk. The decision maker wishes to minimize the risk in some 
way. Two rules are discussed; the minimax rule and the Bayes rule. The 
minimax rule is to take the choice for which the maximum risk is 


smallest. The Bayes rule is to take the choice for which the expected risk 
(with respect to a probability distribution on the states of nature) is 
smallest. 

Statistical decision theory has not succeeded in unifying statistical 
methods, although many statistical problems may be properly thought 
of as decision problems. 
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EXERCISES 


1. Given the cost information in Section 33.2, verify that the total costs 
for cars A and B are shown in Table 33.1. 

2. What are the Bayes decisions in Section 33.2 for the following sets of 
probabilities? 


$1 $2 $3 

i 1 1 

4 2 4 

ill 
4 4 2 

113 
8 8 4 

3. Verify the risks in Table 33.4. 

4. What are the Bayes decisions in Section 33.3 for the following sets of 
probabilities? 


a. 

.95 

.025 

.025 

b. 

.90 

.10 

0 

c. 

.80 

.10 

.10 


5. Given the loss function in Table 33.2, suppose that a sample of two 
items is to be taken from each lot. Determine the risk table for the 
following rules. 


a. 

b. 

c. 


Rule /. Pass the lot if x = 0. 

Rule 2. Pass the lot if .v = 0 or 1. 

Rule 3. Pass the lot if .v = 0, 1, or 2. 

With this risk table, what are the Bayes rules for the sets of 
probabilities in Exercise 4? 

6. A collector is considering purchase of a painting, reputed to be by a 
well-known artist. The painting is priced at S5000. If genuine, it is 
worth $ 10,000; if false, it is worthless. Furthermore, purchase of a fake 
painting or failure to buy a genuine one will damage her reputation. 
The loss table is as follows. 


Genuine (dollars) Fake (dollars) 


Buy 

Not 


-5000 

3000 


6000 

0 


i/7 INFERENCE VERSUS DECISION 


She goes to an appraiser who can detect genuine articles with 
probability .95 and fake articles with probability .70. Determine the 
risk table for the following three rules: 


7. 

8 . 


Rule I. Buy with probability^. 

Rule 2. Buy if appraiser says genuine. 

Rule 3. Do not buy. 

Determine the minimax rule for the risk function in Exercise 6. 


Determine the Bayes rule for the risk function in Exercise 6 for the 
following sets of probabilities: 


a. i 

b. \ 

c. I 


3 

4 ’ 

1 

2 - 

1 


MANY 

DIMENSIONS 



Cl ll-SQCARI:. The iiulcpendcnce of smoking or nonsmoking and the 
meidenee of clisease IS of ten tested usingehi-square. C'opyright c (ierri 
Marsliall 1980 Design Coneeptions. 
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CHI-SQUARE 


34.1 In the course of these lectures a few of the well-known statistics were 
introduced, such as the correlation coefficient, “Student’s” /, and 
Snedecor’s F. Although we have previously made reference to the chi- 
square statistic, we have deferred a fuller description until now. The chi- 
square distribution is a skewed distribution, indexed by one parameter, 
commonly called the degrees of freedom, and is tabulated in Table A.8. 
Statistics having the chi-square distribution arise in a variety of ways; in 
this lecture we will illustrate some of the many uses of chi-square. 


34.2 In 1900 Pearson [3] published a paper on the uses of chi-square, 
illustrating its use in testing the hypothesis that a sample of data has 
come from a completely specified distribution. For example, we might 
have a set of measurement data for which the histogram looks somewhat 
like a Poisson distribution with a specified parameter. We would like to 
have a test to measure the extent to which the data support the normal 
hypothesis. Pearson provided such a test. The steps involved are as 
follows. 

1. Arbitrarily set intervals covering the range of the data. 

2. Count the number of observations in each of the intervals. Denote 
these counts by 0,, O 2 ,..., 0^, where k is the number of intervals. 

3. Calculate the probability for each interval specified by the hypo¬ 
thesized distribution. 

4. Calculate the expected number of observations, £, for each interval 
by multiplying the sample size, n, times the calculated probabilities: 
£, = nPi, £2 = nP 2 , etc. 
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5. Calculate the statistic Z(0, — Eif/Ei. Pearson showed that the 
distribution of this statistic is well approximated by the chi-square 
distribution tabulated in Table A.8. The statistic is therefore widely 
known as the chi-square statistic and is denoted by x^- 

6. Calculate the significance level from Table A.8 by finding the 
probability of a larger value than that observed. Pearson used k - 1 
(the number of intervals minus one) as the degrees of freedom, 
regardless of whether the parameters were known or were estimated 
from the data. However, Fisher [1, 2] showed that a better 
approximation is given by decreasing the degrees of freedom by one 
for each parameter estimated from the data. 

34.3 We will illustrate the goodness-of-fit test described in the previous 
section for two data sets. 

In Table 5.5 we presented the horsekick data of Bortkiewicz. In order 
to test the hypothesis that the data come from a Poisson distribution, we 
estimate the parameter by the sample mean x = 0.61. Using Poisson 
tables, we can then calculate probabilities and expected numbers by 
multiplying these probabilities by n = 200. These quantities are given in 
Table 34.1. From the last two lines of Table 34.1, we calculate the 
X^ statistic. 



= 0.7920 


From Table A.8 we see that the significance level is quite large (using 
k — 1 — 1 = 3 degrees of freedom), so the hypothesis of a Poisson 
distribution is acceptable. 

Consider the hypothesis that the data of Table 34.2 came from a 
normal distribution. 


Table 34.1 Calculation of x ^—Poisson 


X 

0 

1 

2 

3 

4 

Oi 

Pi 

Ei 

0,-E, 

109 

65 

22 

3 

1 

0.544 

0.331 

0.101 

0.021 

0.003 

108.8 

66.2 

20.2 

4.2 

0.6 

0.2 

-1.2 

1.8 

-1.2 
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Table 34.2 Distribution of Brain Weight 


Brain Weight (grams) 

Frequency 

Under 1100 

0 

1100-1150 

1 

1150-1200 

10 

1200-1250 

21 

1250-1300 

44 

1300-1350 

53 

1350-1400 

86 

1400-1450 

72 

1450-1500 

60 

1500-1550 

28 

1550-1600 

25 

1600-1650 

12 

1650-1700 

3 

1700-1750 

1 

1750 and over 

0 


Source. Reproduced from Biometrika. 4. 
Pearl Biometrical studies on man I. Variation 
and correlation in brain-weight. Copyright © 
1905 by the Biometrika Trustees. Reprinted 
by permission from the Biometrika Trustees. 


From the data we estimate the mean and standard deviation by 

X = 1400.4807 


and 


s = 107.30420 

Using these estimated values, we can obtain probabilities from Table A.3 
and multiply by the sample size n = 416 to obtain exp>ected values. These 
quantities are shown in Table 34.3. 

From Table 34.3 we can calculate our chi-square statistic to be 
= 11.6945, and from Table A.8 we see that the probability of a larger 
value with /c—1—2=15—1—2=12 df is about .5. Therefore we 
conclude that the hypothesis of a normal distribution is reasonable. 


34.4 Another use of chi-square is in the testing for Independence of row and 
column classifications in two-way tables. Again we give an example. In a 
large statistics class attended by graduate students as well as under- 
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Table 34,3 Calculation of —Normal 


Interval 

0. 

Pi 

Ei 

Below 1100 

0 

.0026 

1.0816 

1100-1150 

1 

.0073 

3.0368 

1150-1200 

10 

.0208 

8.6528 

1200-1250 

21 

.0501 

20.8416 

1250-1300 

44 

.0928 

38.6048 

1300-1350 

53 

.1456 

60.5696 

1350-1400 

86 

.1808 

75.2128 

1400-1450 

72 

.1772 

73.7152 

1450-1500 

60 

.1466 

60.9856 

1500-1550 

28 

.0954 

39.6864 

1550-1600 

25 

.0494 

20.5504 

1600-1650 

12 

.0215 

8.9440 

1650-1700 

3 

.0073 

3.0368 

1700-1750 

1 

.0020 

.8320 

Above 1750 

0 

.0006 

.2496 


graduate students, grading was a major problem for the instructor. Some 
undergraduates felt that the grading policy gave preferential treatment 
to the graduate students. Consider the data of Table 34.4. The number of 
each letter grade assigned is recorded for four different groups of 
students. It is of interest to ask if the distribution of grades is different for 


Table 34.4 Test for Independence 




A 

B 

C 

D 

Total 

Junior 

0 

4 

5 

7 

6 

22 


E 

6.3871 

7.8065 

5.3226 

2.4839 


Senior 

0 

8 

6 

6 

4 

24 


E 

6.9677 

8.5161 

5.8065 

2.7097 


M.S. 

0 

16 

14 

11 

4 

45 


E 

13.0645 

15.9677 

10.8871 

5.0806 


Ph.D. 

0 

8 

19 

6 

0 

33 


E 

9.5806 

11.7097 

7.9839 

3.7258 


Total 


36 

44 

30 

14 

124 
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the difTerent levels of students. Suppose that the probability of receiving 
an A is independent of the student classification. Then 

Pijunior and A) = F(junior)F(/l) 

Under the hypothesis of independence, this probability is estimated by 
(36 124)(22 124), and the expected number is estimated by multiplying 
by 124. Therefore the expected number for the junior A cell is given by 
(36)(22), (124) = 6.3871. We can obtain the other expected numbers in 
the table similarly. Then we calculate the chi-square statistic to be 
y'- = 19.0684. 

The number of degrees of freedom for a table with r rows and c 
columns is simply (r - 1) (c - 1). This follows from our previous rule for 
degrees of freedom. Under the hypothesis of independence, we need to 
estimate only the row probabilities and column probabilities. Since the 
row probabilities must total 1, we need to estimate only r — 1 of these 
and, similarly, c — 1 column probabilities. Therefore the degrees of 
freedom are 

rc — 1 — (/• — 1) — (c — 1) = (r — I) (c — 1) 

For the current example, the degrees of freedom are 9. From Table 
A.8, the probability of a chi-square with 9 degrees of freedom exceeding 
19.0684 is about .025. Because of this relatively small value, we would 
tend to accept the idea that the distribution of grades is not independent 
of the student classification. 


34.5 


In recent years other goodness-of-fit statistics have gained preference 
over chi-square, yet it continues to be widely used and some knowledge 
of it seems almost mandatory. 


SUMMARY. Pearson provided a chi-square goodness-of-fil test to test 
the conformity of a set of data to a hypothesized distribution. A 
frequency distribution is constructed for a sample of data. For each of 
the classes or intervals, the expected or theoretical frequencies are 
calculated for the specified distribution. Then the chi-square statistic is 
calculated using the formula 1.(0 — E)~/E, where 0 denotes observed 
frequencies and E denotes expected frequencies. The degrees of freedom 
(as modified by Fisher) are »j —(number of classes) —(number of para¬ 
meters estimated). 
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Another use of the chi-square statistics is to test independence of the 
two criteria of classification in a two-way table of frequency counts. In 
this case, the observed frequency for any ceil is (row total) (column 
total)/(table total). 

The degrees of freedom for the chi-square statistic are (r - l)(c - 1), 
where r = number of rows and c = number of columns. 
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EXERCISES 

1. Verify the chi-square calculations in Table 34.1 and the resulting 
significance level for the horsekick data. 

2. Examine the goodness of fit of the Poisson distribution to 
“Student’s” yeast cell data (Table 12.2) by calculating the chi-square 
statistic and evaluating the significance level. 

3. Verify that the formula k - \ - (number of parameters estimated) 
does result in (r l)(c - 1) for the degrees of freedom as in Section 
34.4. 

4. Verify the value for x and s in Section 34.3 and verify the calculation 
of yf in Table 34.3. 

5. In the case of a continuous distribution the choice of the grouping 
interval is arbitrary and has some effect on the chi-square test. 
Recalculate the value of chi-square for the data of Table 34.2 using 
intervals 1I00-12(X), 12(X)-1300, etc. Arc you led to the same 
conclusions concerning normality of the data I 
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6. Examine the goodness of fit of the normal distribution to the data of 
Table 5.2. 

7. Sociologists often study the effect of parental attitudes on attitudes 
of the children. A survey was made of high school students in a city 
and they were asked about their participation in church activities. 
Church attendance was reported as follows. 


Children 


Parents 


Once a Week 

Once a Month 

Seldom 

Once a week 

74 

67 

27 

Once a month 

11 

14 

33 

Seldom 

5 

10 

17 


Calculate the chi-square to test for independence of parental church 
attendance from church attendance of the children. Determine the 
significance level. 

8. High school football players were asked questions to determine their 
plans to attend college after graduation. The researcher was 
interested in the possible relationship between the strength of their 
high school program and the strength of the football program at the 
college of their choice. The season record for students’ high school 

football teams and their chosen college football teams were as 
follows. 



High School Team Win Record 

College Team Win Record 
(percent) 


(percent) 

0-25 

25-50 

50-75 75-100 

0-50 

25 

26 

11 2 

50-100 

21 

23 

12 17 


Calculate the chi-square statistic and the significance level to test for 
independence. 

9. A public opinion research company surveyed attitudes of people 
concerning a proposed cut in state taxes. Calculate the chi-square 
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statistic to test for independence and the significance level for the 
following set of data. 



Favor 

Oppose 

State employees 

198 

217 

Not state employees 

476 

109 


10. The people working in an office building were asked about the 
comfort of their office (too cold, about right, too warm) in October, 
January, and April. Calculate the chi-square test for independence 
and significance level. 



October 

January 

April 

Too cold 

231 

168 

212 

About right 

66 

89 

79 

Too warm 

148 

113 

136 


11. People were asked for their opinion about salaries of public school 
teachers (too high, too low, about right). Some people questioned 
were teachers, some had teachers in their immediate families, and 
others were not closely related to teachers. Calculate the chi-square 
statistic to test for independence and the significance level for the 
following set of data. 



Teacher 

Related 

Not Related 

Too high 

2 

102 

624 

About right 

37 

378 

578 

Too low 

316 

314 

309 




CHARLFS FDWARD SPEARMAN. 1863-1945. Keystone Press 


Agency. 
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FACTOR 

ANALYSIS 


35.1 Since its early beginnings with contributions from Galton [1888] and 
Pearson [1901], factor analysis has been developed and pushed to its 
present state primarily by psychologists. In 1904 Spearman [4] pre¬ 
sented the idea of measuring general intelligence. Suppose that we have 
aptitude scores for an individual in mathematics, science, and art. Then, 
according to the general intelligence theory, these three scores would 
result from a combination of a general intelligence score for the 
individual and specific aptitude scores for the three areas. 

35.2 In order to carry the matter further we must discuss some preliminary 
concepts. Recall that a standardized variable is one that has mean zero 
and variance unity. A variable is standardized by subtracting its mean 
and then dividing by its standard deviation. In psychology as well as in 
other social sciences, much use is made of standardized variables, and we 
will generally assume in this discussion of factor analysis that the 
observation variables have been standardized. We are trying to express 
these standardized variables in terms of a basic, underlying, but 
unmeasurable general factor. 

35.3 Suppose the factor structure is known and the standardized aptitude 
scores Z,, Zj, and Z 3 for mathematics, science, and art are given by 
linear combinations of a standardized general intelligence score, F, and 
specific aptitude scores ei, ^ 2 , and Cj as follows. 

Z, (mathematics) = .9F -f 
Z 2 (science) = . 8 F + 

23 (art) =. 1 F + <>3 

We also suppose that F and the es are uncorrelated. 


i29 
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The coefficients of F are called factor-loading coefficients—the idea 
being that they describe the way Z,, Z 2 , and Z 3 “load onto” the 
underlying factor F. Thus the mathematics and science scores are more 
strongly related to the factor F than is the art score. We might therefore 
associate the factor F with mathematics and science aptitudes. This idea 
can be made a little more precise because the factor-loading coefficients 
give the correlation coefficients between the Zs and F. So the correlation 
coefficient between Zj and F is .9, between Zj and F . 8 , and between Z 3 
and F .1. 

The squares of the factor-loading coefficients are called communalities 
and are also directly interpretable. They give the fractions of the 
variances of the Zs accounted for by the common factor F. The 
remaining portions of the variances are called specificities. For the 
example at hand we have the following data. 



Total 

Variance 

Communality 

Specificity 


1.0 

.81 

.19 


1.0 

.64 

.36 

^3 

1.0 

.01 

.99 


We could say, for example, that 99% of the variance of the aptitude score 
for art is due specifically to art and only 1 % is due to the common factor 
involving mathematics, science, and art. 


35.4 The simple models suggested by Spearman’s work were soon generalized 
to include more than one “general intelligence” factor. Suppose that we 
had two uncorrelated general scores Fj and F 2 and that the standar¬ 
dized scores Zj, Zj, and Z 3 were given as follows. 

Z, = .9F, + .IF 2 + c, 

Z 2 = . 8 F, -f- . 2 F 2 + C 2 

Z3 = .IFj + .8F2 + C3 

The interpretation of the factor-loading coefficients as correlation 
coefficients carries forward directly and, in fact, provides some interpre¬ 
tation for the factors. The general factor Fj has correlation .9, . 8 , and .1 
with Z,, Z2, and Z3, while the general factor F2 has correlations of .1, .2, 
and . 8 . This suggests that F, may be a scientific factor while F 2 may be an 

aesthetic factor. 
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It is also possible to calculate the communalilies and specificities from 
the factor-loading coefficients. This requires forming the matrix product 
.*1.-1' from the matrix, .-1, of factor-loading coefficients. For this example. 

.82 .74 .17 
.74 .68 .24 
.17 .24 .65 

The diagonal elements of .4.4' give the communalities. So, for this two- 
factor model, we have: 


4.4' = 


, i 

8 .2 

1 » 


.9 .8 .1 
.1 .2 .8 



Total 

Variance 

Communality 

Specificity 

Z, 

10 

.82 

.18 

z. 

1.0 

68 

.32 

Z 3 

1.0 

.65 

.35 


35.5 We have developed the discussion in the previous section as though we 
knew the factor-loading coefficients (the coefficients of the Fs) and as 
though we could measure the Fs and the es. In fact, the usual situation is 
that we do not know the factor-loading coefficients and we cannot 
measure the Fs and the cs; we can measure only the Zs. From 
measurements of the Zs we must try to estimate the coefficients, the Fs, 
and the cs. This can be done in many ways, and a casual survey of the 
factor analysis literature will show that many methods are available. 

Estimation procedures are readily available in packages of statistical 
computer programs. 

The use of factor analysis has been successful in a number of cases, 
particularly when the factors, F, are known or suspected in advance. On 
the other hand, factor analysis has been employed to analyze many sets 
of data, hoping to discover underlying factors, without preconceived 

ideas of what they might be. Successful applications in these circum¬ 
stances are somewhat rare. 


35.6 


Another way to approach factor analysis is the use of n-dimensional 
geometry. Science fiction writers invent fantasy worlds by introducing a 
fourth dimension, and the idea is quite intriguing. In a very real sense we 

actually live in a world of more than three dimensions, but it is not the 
world of science fiction. 
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To illustrate the idea of multidimensional space, consider a popu¬ 
lation of people. For each person we can obtain physical measurements 
such as height, weight, waist size, etc. We can also obtain measurements 
such as IQ scores and aptitude test scores. Suppose, in addition, that we 
obtain quantitative measures of characteristics such as courage, tenacity, 
and honesty. Then each person can be thought of as a long list of scores. 
Now imagine that we have an axis for length, one for weight, and one for 
courage. We can plot a person’s scores in «-dimensional space, and each 
person can be thought of as a point in n-dimensional space. A population 
of people is a cluster of points, like an n-dimensional balloon. A 
population of people similar in nature would be a tightly packed cluster, 
while a population of people with diverse natures would form a loose 
cluster. Presumably the stereotypic, well-adjusted individuals are points 
in the center of the cluster. The maladjusted and abnormal as well as the 
exceptional are points on the fringes of the cluster. 

If we have obtained three scores on each of 50 people we can conceive 
of our total set of data as 50 points in three-dimensional space. More 
generally, we will have obtained p scores on each of n people and can 
think of our data as n points in p-dimensional space. Our n points form 
some sort of cluster in p-dimensions. Typically, we hope that it is an 
elliptical cluster (a p-dimensional cigar), and we try to find the principal 
axes of the cluster. Most of the variation lies along the first principal axis, 
the next largest amount of variation along the second principal axis, etc. 
Of course, we appeal to our geometric intuition, but the methodology is 
algebraic. The method is known as principal components, and the first 
paper on the subject was by Pearson (1901). 


SUMMARY. Spearman conceived of the idea that scores on tests are 
combinations of a general intelligence factor and a factor specific to 
mathematics, art, music, etc. This led to a factor model in which a 
standardized test score is expressed as a constant times a standard, 
unmeasurable factor score plus a standard, unmeasurable specific score. 
The constant is called the factor-loading coefficient and is also the 
correlation between the measurable test score and the unmeasurable 
factor score. The variance of the test score is the sum of two components. 
a part common to all test scores called the commonality and a part 
specific to a particular test score called the specificity. 

The single-factor model was later generalized to allow for several 
factors. The computation without a computer program remains oner¬ 
ous, but many programs are available. 
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EXERCISES 

1. F, and F 2 are uncorrelated factor scores (standardized variables), and 
standardized scores Z,, Zj, and Z 3 are given by 

Zj = .8F| -|- .IF 2 -l- Cj 

Z 2 = .IF, + JF 2 + ^2 
Z 3 = .2FI + .6F 2 + ^3 

a. What are the correlation coefficients between the Zs and the Fs? 

b. What is the basic interpretation of the Fs in terms of the Zs? 

c. Calculate the communalities and specificities. 

2. A student researcher conducted a survey of campers during his 
summer vacation. He visited with other campers at different camp>- 
sites and obtained responses to a questionnaire concerning their 
attitude toward vacation activities. Upon returning to the campus in 
the fall, he analyzed his data using a factor analysis program from the 

computing center and identified two factors. The coefficients are as 
follows. 


z 

F, 

Pz 

Hiking 

.56 

-.08 

Fishing 

A1 

.03 

Boating 

.39 

.10 

Swimming 

.45 

.06 

Ranger talks 

.07 

.56 

Guided tours 

.22 

.34 

Camping chores 

.14 

.41 

Campfire 

-.09 

.53 
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a. How would you interpret the factors identified? 

b. Calculate the communalities and specificities. 

3. Most computer centers have packages of statistical procedures 
available, and a factor analysis procedure is usually included as one of 
the programs. The users manual for a factor analysis program gives 
the following factor-loading coefficients for a problem involving three 
measurements on each of five subjects. 


Factor 

Subject 

1 2 

3 

1 

.3 .7 

.1 

2 

.7 .1 

.2 

3 

.8 .1 

.1 

4 

.2 .8 

.1 

5 

.1 .6 

.1 

Calculate the communalities and specificities for the three factors. 
Which subjects typify factor 1 ? Which subjects typify factor 2 ? 

An examination for a class of 20 students had five questions, two 

intended to measure retentior 

\ of the lecture 

material and three 

intended to measure the ability of the students to apply the material. 
The factor-loading coefficients for the two important factors identi- 

fied follow. 




Factor 


Question 

1 

2 

1 

.91 

.02 

2 

.15 

.78 

3 

-.88 

.11 

4 

.79 

.12 

5 

.09 

.86 


Which questions were intended to measure retention? Which ques¬ 
tions were intended to measure ability to apply the material ? 

5. The students in a large undergraduate class volunteer to take 
standard examinations in college algebra, accounting, English 
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composition, American history, and chemistry. The factor-loading 
coefficients for the two major factors are given here. How would 
you interpret the two factors? 

Factor 

Exam 1 2 

College algebra .78 .52 

Accounting .81 .59 

English composition .76 —.61 

American history .69 —.66 

Chemistry .72 .68 




P. C. MAHALANOBIS. 1893-1972. Indian Statistical Institute 


Calcutta. 
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36.1 


36.2 


DISCRIMINANT 

FUNCTION 

ANALYSIS 


There is a strong desire on the part of many to predict periods of upswing 
as well as plunges in the stock market. Attempts to do so usually make 
uSe of some function of previous stock market prices. Similarly, we 
would like to be able to identify the potential dropouts and the potential 
persisters among entering college freshmen by examining their high 
school records. As another example, a physician would like to be able to 
discriminate between patients likely to have heart attacks and those who 
will not by looking at their medical records. Some success has been 
achieved on these and many other discrimination problems, but there is 
still a desire to make the discrimination sharper. In this lecture we 
introduce statistical discriminant functions, which have been used to 
attempt the sort of discrimination referred to in these examples. 

The general approach is to use data from two or more known 
populations where the population identity of each individual is known. 
One then devises a discriminant function, which does as good a job as 
possible of sorting the individuals in the sample into the correct 
populations. If the classification is extremely good, one will have the 
confidence of correctly classifying an individual known to be from one of 
the populations, the specific population unknown. 

We will now develop some of the ideas of discriminant function analysis 
by use of an example. Consider the data in Table 36.1. IQ scores and high 
school grade point averages have been obtained for 24 college freshmen 

Twelve withdrew from college during their freshman year, and the other 
12 continued to their sophomore year. 

Given these data, we set out to find a discriminant function. When we 
consider A', (IQ), our hopes begin to fade. There is a slight difference 
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Table 36.1 College Freshmen 



Dropouts 

Persisters 

X, (IQ) 

X 2 (HS 
Average) 

X, (IQ) 

X 2 (HS 
Average) 


109 

2.50 

107 

2.54 


108 

2.68 

105 

2.65 


113 

2.54 

106 

2.91 


115 

2.86 

111 

2.70 


116 

2.79 

112 

2.86 


114 

3.03 

113 

3.24 


119 

3.00 

114 

3.11 


121 

3.25 

118 

3.10 


122 

3.15 

118 

3.26 


122 

3.37 

119 

3.25 


124 

3.35 

123 

3.27 


128 

3.62 

121 

3.49 

Total 

1411 

36.14 

1367 

36.38 

Mean 

117.58 

30.12 

113.92 

3.03 


between the means (117.58 and 113.92) in the two samples, but a t test 
quickly reveals that the difference is not significant and due to chance. 
Even if the difference were significant, the average IQ for the persisters is 
lower than for the dropouts, and we would be inclined to be distrustful of 

the results. 

We encounter a similar experience when we examine X 2 (high school 
grade point average). The difference in sample means is very slight and is 

of no significance, as judged by a f test. 

Our hopes dashed, we might read further about discriminant 
functions to learn that we are encouraged to examine the variables 
jointly. This we have not yet done. We have examined X, and Xj 
separately. We decide to make another effort. 


36.3 One of the simplest ways of analyzing two variables is to plot the data m 
the form of a scattergram. This we have done in Figure 36.1. Once we 
have made this plot, our hopes of discriminating between the dropouts 
and persisters on the basis of X. and X^ increase. We can sec from the 
scattcrgriim that the data may have come from two overlapping 
populations. We might try separating the data as much as possible into 
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High school average 



two clusters by a straight line. This is analogous to distinguishing 
between two long mountain ridges by looking up the valley between 
them. As long as we try to look across the ridges, we have difficulty in 
distinguishing between them, as when we look only at (or X 2 ) we are 
unable to distinguish between dropouts and persisters. 

In Figure 36.1 we have drawn a straight line. We have done this merely 
by looking at the points and without any theoretical reason for choice of 
line. In fact, considerable theoretical research has been done on the 
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choice of line, which is not discussed in this lecture. Note that the line 
passes through the points (100, 2.15) and (130, 3.75). We thus ascertain 
the slope to be (3.75 — 2.15)/(130 — 100) = .053. We also determine the 
A' 2 -intercept to be 2.15 — (100)(.053) = —3.15. Therefore the equation 
of the straight line that we have chosen is 

X 2 = -3.15 + .053^1 


or, equivalently. 


.053X, - X2 = 3.15 
We can also determine that 

.0532^, — 2^2 < 3.15 for points above the line 


and 


.0532f, — 2^2 > 3.15 for points below the line 

We have thus arrived at a discriminant function .0532^1 — 2^2 and a 
classification rule. Classify an individual as a dropout if .0532£!’, — X 2 
> 3.15. Otherwise, call that student a persister. 


36.4 We now see how well this classification rule works for the data of Table 
36.1. Of course, we can see from Figure 36.1 that the straight line leaves 
three persisters as well as three dropouts on the wrong side of the line. 
However, it is worthwhile to evaluate the discriminant function chosen 
for each of the data points. These values, with the resulting classifi¬ 
cations, are given in Table 36.2. 

From Table 36.2 we see that our discriminant function performs very 
well. We might summarize this performance as in Table 36.3. 

36.5 Given more variables, the simple method given in this lecture will not 
work. However, we can still use it to aid our intuition. We can imagine 
two p-dimensional overlapping clusters of points. We try to separate the 
two clusters by passing a plane between them. Of course, this is actually 
done algebraically, and the plane is a linear combination of our 
variables. Computer programs are available that will calculate dis¬ 
criminant functions. A full understanding of what is taking place requires 
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Table 36.2 Discriminant Analysis 



Dropouts 


Persisters 


Function Classification 


3.277 

Dropout 

3.044 

Persister 

3.449 

Dropout 

3.235 

Dropout 

3.358 

Dropout 

3.012 

Persister 

3.307 

Dropout 

3.163 

Dropout 

3.316 

Dropout 

3.096 

Persister 

3.222 

Dropout 

3.164 

Dropout 


Function 

Classification 

3.131 

Persister 

2.915 

Persister 

2.708 

Persister 

3.183 

Dropout 

3.076 

Persister 

2.749 

Persister 

2.932 

Persister 

3.154 

Dropout 

2.994 

Persister 

3.057 

Persister 

3.249 

Dropout 

2.923 

Persister 


Table 36.3 Performance of Discriminant Function 



Classification 

Group 

Number Correct 

Number Incorrect 

Dropout 

9 

3 

Persister 

9 

3 


a knowledge of statistics and p-dimensional geometry, but discriminant 
function analysis can be useful to one familiar only with the simple ideas 
in this lecture. 


36.6 Discriminant functions have been used widely by paleontologists. Given 
a bone fragment (jawbone, elbow, etc.) judged to be quite old by all 
dating techniques, we are interested in whether or not the fragment is 
human. McHenry [5] gives a detailed discussion of the matter. 

36.7 In 1936 both Mahalanobis [4] and Fisher [2] published important 
papers contributing to the development of discriminant functions. 
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SUMMARY. Frequently sample data are available from two popu¬ 
lations that are known to be different. It is desired to find a function of the 
data that will have rather different values for the two samples. If this 
function shows the samples to have come from different populations, the 
hope is that it would classify by population future observations. 

The basic idea of linear discriminant functions can be obtained by 
plotting observations from two overlapping populations. When each 
variable is regarded separately, it may be impossible to see any 
separation. However, when two or more variables are considered jointly, 
it may be that a straight line (or plane) separates the populations fairly 
well. The equation for the line gives a linear discriminant function. 
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EXERCISES 

1. Verify the discriminant function values and classifications given in 
Table 36.2. 

2. The following data on state finances were given by the U.S. Census 
Bureau for the year 1973. All figures are in millions of dollars. 
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State 

Revenue 
(per capita) 

Debt 

(per capita) 

Expenditure 
(per capita) 

Southeast 

Alabama 

478 

848 

454 

Arkansas 

449 

109 

397 

Florida 

460 

1,260 

430 

Georgia 

484 

755 

471 

Kentucky 

531 

1,816 

505 

Louisiana 

565 

1,214 

554 

Mississippi 

529 

584 

495 

North Carolina 

495 

528 

451 

South Carolina 

498 

603 

469 

Tennessee 

422 

642 

386 

Virginia 

479 

388 

464 

West Virginia 

595 

842 

582 

Northeast 

Connecticut 

560 

2,580 

526 

Maine 

531 

358 

526 

Massachussetts 

582 

3,105 

602 

New Hampshire 

408 

179 

420 

New Jersey 

456 

2,755 

443 

New York 

756 

11,801 

721 

Pennsylvania 

532 

4,596 

530 

Rhode Island 

587 

391 

553 

Vermont 

742 

421 

754 


Source. U.S. Bureau of the Census, Government Finances in 1972- 
1973. 


Plot a scattergram of revenue versus debt and try to obtain a linear 
discriminant function to discriminate between northeastern states 
and southeastern states. Also try revenue versus expenditure and debt 
versus expenditure. Using what seems to be the most successful 
discriminant function, construct tables similar to Tables 36.2 and 
36.3. 

3. Do the best job possible of correctly identifying by population the 
following samples using a rule of the form: If X, < C classify in 
one population; if X i > C classify in the other population. Do the 
same using X 2 and then Next use large and small values of 
2Xi + ^2 + 32 ^ 3 . 
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Table A.l 
Table A.2 
Table A.3 
Table A.4 
Table A.5 
Table A.6 

Table A.7 
Table A.8 


Random Numbers 
Binomial Probabilities 
Cumulative Normal Distribution 
Poisson Probabilities 
“Student’s” t Distribution 

Values of the Correlation Coefficient for Different Levels 

of Significance 

The F Distribution 

The Chi-Square Distribution 
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Table A.1 Random Numbers 


15 77 01 64 69 
85 40 51 40 10 

47 69 35 90 95 

13 26 87 40 20 
10 55 33 20 47 

05 06 67 26 77 
65 50 89 18 74 
59 68 53 31 55 

31 31 05 36 48 

91 59 46 44 45 

63 59 73 21 67 
89 72 47 46 94 
70 51 21 03 18 

14 15 99 60 44 

92 46 90 39 99 

81 23 17 13 01 

87 54 42 46 56 
74 73 84 98 13 
94 55 14 00 97 
69 21 94 26 20 

82 36 36 89 29 
25 06 22 30 87 

82 37 97 60 92 

83 71 07 22 15 
73 13 79 15 12 

91 28 00 57 30 
33 47 55 62 57 
56 66 25 32 38 

88 40 52 02 29 
87 63 88 23 62 

32 25 21 15 08 
44 61 88 23 13 
94 44 08 67 79 
13 24 40 09 00 
78 27 84 05 99 

42 39 30 02 34 
04 52 43 96 38 
82 85 77 30 16 
38 48 84 88 24 
91 19 05 68 22 

54 81 87 21 31 

65 43 75 12 91 
49 98 71 31 80 
03 98 68 89 39 
56 04 21 34 92 

48 09 36 95 36 
23 97 10 96 57 

43 97 55 45 98 
40 05 08 50 79 

66 97 10 69 02 


69 58 40 81 16 

15 33 94 11 65 

16 17 45 86 29 

40 81 46 08 09 

54 16 86 11 16 

14 85 40 52 68 
42 07 50 15 69 
73 47 16 49 79 

75 16 00 21 11 

49 25 36 12 07 

80 00 25 58 25 
78 56 10 65 97 

50 21 99 49 73 
62 72 38 18 36 
64 08 00 97 27 

37 57 92 16 34 
28 89 02 06 98 
11 48 25 33 39 
32 51 92 47 03 

73 90 70 92 76 

87 70 08 71 98 
87 44 48 90 91 

76 39 17 84 34 

17 55 56 82 62 

18 34 22 24 75 

92 12 38 95 21 
08 21 77 31 05 

64 70 26 27 67 
82 69 34 50 21 

51 07 69 59 02 

82 34 57 57 35 
01 59 47 64 04 

41 61 41 15 60 

65 46 38 61 12 
85 75 67 80 05 

99 46 68 45 15 
13 83 80 72 34 
69 32 46 46 30 

55 46 48 60 06 

58 04 63 21 16 

40 46 17 62 63 
20 36 25 57 92 

59 57 32 43 07 
71 87 32 14 99 
89 81 52 15 12 

20 82 53 32 89 

74 07 95 26 44 
35 69 45 96 80 
89 58 19 86 48 
25 36 43 71 76 


60 20 00 84 22 
57 62 94 04 99 
16 70 48 02 00 
74 99 16 92 99 

59 34 71 55 84 

60 41 94 98 18 
86 97 40 25 88 
69 80 76 16 60 
42 44 84 46 84 
25 90 89 55 25 

72 06 12 86 74 
84 79 42 31 49 
06 99 19 24 96 

63 92 61 55 93 
54 96 63 40 54 

15 80 90 25 64 
59 90 74 13 38 
27 36 08 99 57 
92 33 73 20 21 
49 14 60 34 43 

49 00 89 89 99 
38 53 10 60 29 
67 65 52 89 90 

88 83 86 38 14 

56 47 45 22 81 

15 70 78 50 88 

64 74 04 93 42 
77 40 04 34 63 
74 00 91 27 52 

89 49 14 98 53 

22 03 33 48 84 
99 59 96 20 30 
11 88 83 24 82 

90 62 41 11 59 

57 05 71 70 21 

19 74 15 50 17 

20 84 56 19 49 

84 20 68 72 98 
90 08 83 83 98 

23 38 25 43 32 

99 71 14 12 64 
33 65 95 48 75 

85 06 64 75 27 
42 10 25 37 30 
84 11 12 66 87 

92 68 50 88 17 

93 08 43 30 41 
46 26 39 96 33 
27 98 99 24 08 
00 67 56 12 69 


28 26 46 66 36 
05 57 22 71 77 

59 33 93 28 58 
85 19 01 23 11 
03 48 17 60 13 

62 20 94 03 71 
14 17 73 92 07 

58 53 07 04 53 
83 20 49 17 12 
83 47 17 23 93 

54 79 70 85 88 
94 15 31 13 09 

39 43 10 14 12 
77 66 82 10 91 
34 70 27 48 18 

67 77 29 95 84 
98 66 23 20 23 

60 42 88 68 25 

29 77 37 06 98 
90 51 72 11 07 

29 08 02 72 32 

40 07 58 97 84 

62 97 04 33 81 

63 89 39 81 90 

30 82 38 34 52 

01 07 90 72 77 
20 19 09 71 46 
98 99 89 31 16 
98 72 03 45 65 

41 92 36 07 76 

37 37 29 38 37 
87 31 33 69 45 
24 07 78 61 89 

85 18 42 61 29 

31 99 99 06 96 

44 80 13 86 38 

59 14 85 42 99 
94 62 63 59 44 
40 90 88 25 26 
98 94 65 35 35 

51 68 50 60 78 
00 06 65 25 90 
29 17 06 11 30 
08 27 75 43 97 
47 21 06 86 08 

37 92 02 23 43 

86 45 74 33 78 

60 20 73 30 79 
94 19 15 81 29 
07 89 55 63 31 


86 66 17 34 49 
99 68 12 11 14 
34 32 24 34 07 

74 00 79 41 69 
38 71 23 91 83 

60 26 45 17 92 

93 11 93 45 15 
66 94 94 18 13 

21 93 34 61 16 
99 56 14 39 16 

71 58 21 98 48 
45 43 03 82 81 

94 08 55 54 70 
81 51 67 01 47 
68 59 91 83 32 

80 84 84 87 22 

90 55 31 83 48 

22 89 67 83 16 
64 63 34 31 43 

75 94 19 49 40 

68 16 29 82 19 
09 04 33 56 72 

91 27 56 46 35 
25 62 58 68 87 

57 48 30 34 17 

99 53 04 34 73 
37 32 69 69 89 
12 90 50 28 96 
30 89 71 45 91 
85 37 84 37 47 

89 76 25 09 69 

58 48 00 83 48 
42 58 88 22 16 
88 76 04 21 80 

53 99 25 13 63 

40 45 82 13 44 
71 16 34 33 79 
00 89 06 15 87 
85 74 55 80 85 
16 91 07 12 43 

22 69 51 98 37 

16 29 34 14 43 
68 70 97 87 21 

54 20 69 93 50 
35 39 52 28 09 

63 24 69 80 91 
84 33 38 76 73 

17 19 03 47 28 
82 14 35 88 03 
50 72 20 33 36 
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Table A.1 Continued 


15 62 38 72 92 
77 81 15 14 67 
18 87 05 09 96 
08 58 53 63 66 

16 07 79 57 61 


03 76 09 30 75 
55 24 22 20 55 
45 14 72 41 46 
13 07 04 48 71 
42 19 68 15 12 


77 80 04 24 54 
36 93 67 69 37 
12 67 46 72 02 
39 07 46 96 40 
60 21 59 12 07 


67 60 10 79 26 
72 22 43 46 32 
59 06 17 49 12 
20 86 79 11 81 
04 99 88 22 39 


21 60 03 48 14 
56 15 75 25 12 

73 28 23 52 48 

74 11 15 23 17 

75 16 69 13 84 


54 1 3 05 46 1 7 
95 27 23 17 39 
22 39 44 74 80 
69 95 21 30 11 
75 75 63 97 12 


05 51 24 53 57 
80 24 44 48 93 
25 95 28 63 90 
98 81 38 00 53 
11 57 05 86 52 


46 51 14 39 17 
75 94 77 09 23 
41 19 48 46 72 
41 40 04 16 78 
82 72 47 72 14 


21 39 89 07 35 
48 75 91 69 03 
51 12 97 39 83 
67 29 83 41 18 
37 72 69 75 48 


47 87 44 36 62 
55 51 09 74 47 
35 83 23 17 29 
30 90 44 37 64 
72 21 52 51 81 


08 74 79 30 80 
04 88 45 98 60 
97 35 74 05 75 
53 09 93 28 29 
26 36 68 48 09 


70 11 66 79 25 
90 92 74 77 87 
42 13 49 48 38 
80 19 68 30 45 
37 69 26 22 80 


88 01 94 52 31 
40 18 65 87 37 
74 19 06 42 60 
94 49 49 71 21 
23 34 10 45 70 


38 57 98 71 62 
08 68 62 39 52 
20 79 90 81 77 
93 93 71 30 34 
83 51 07 37 44 


12 56 61 01 54 
84 74 90 68 18 
18 51 71 27 27 
52 65 83 40 13 
62 96 74 42 64 


49 16 57 15 79 
03 51 79 78 74 
21 88 87 28 48 
56 41 73 33 41 
72 39 19 70 17 


56 63 22 94 28 
75 23 73 75 98 
23 44 03 03 80 
59 16 59 50 98 
01 04 01 22 33 


11 39 69 55 38 
47 85 07 26 02 
53 89 07 87 93 
24 24 87 06 75 
04 84 63 27 65 


53 06 97 20 42 
61 28 01 22 16 
30 17 84 17 74 
99 52 09 88 05 
84 39 45 55 31 


09 14 90 43 48 
14 12 15 67 22 
16 53 31 39 01 
86 25 43 50 94 
95 88 93 90 37 


97 28 25 81 49 
18 87 02 72 08 
53 40 11 75 45 
60 49 03 41 56 
09 16 12 75 04 


71 69 22 04 51 
74 52 16 03 82 
13 56 85 31 37 
78 33 77 28 92 
39 69 95 00 48 


56 46 56 15 10 

20 19 66 23 62 
09 17 71 96 79 

21 90 10 62 01 
26 85 28 73 08 


69 59 99 50 29 
37 51 04 89 31 
39 50 79 27 62 
97 06 45 01 19 
66 92 10 66 75 


33 50 16 93 09 
32 19 59 85 57 
71 14 95 53 03 
95 12 24 18 52 
62 61 27 82 57 


64 20 19 87 54 
31 28 07 58 77 
80 04 28 47 76 
24 60 22 51 19 
59 16 11 26 29 


88 15 12 54 24 
03 98 26 76 09 
35 73 67 78 28 
34 54 08 24 73 
18 97 78 44 43 


06 99 57 07 28 
10 44 57 61 28 
09 39 88 63 74 
86 72 11 44 69 
58 92 78 70 80 


51 34 54 98 50 
60 29 85 70 79 
41 26 92 42 33 
76 90 81 17 85 
09 65 32 68 26 


70 88 02 86 48 
80 29 19 98 92 
06 80 06 33 84 
57 47 35 16 84 
65 73 90 50 46 


58 54 29 98 27 
20 18 34 22 73 
53 90 46 56 19 
97 16 93 94 65 
72 55 71 70 92 


40 51 92 07 13 
57 40 67 17 28 
50 58 33 84 53 
70 95 95 83 20 
04 22 53 19 29 


58 41 59 56 94 
63 57 74 36 18 
14 74 17 40 73 
91 42 57 95 63 
67 29 13 56 70 


16 32 51 42 54 
65 55 25 50 68 
86 11 04 02 04 
00 86 29 02 53 
45 73 45 05 04 


77 37 13 85 19 
35 90 00 03 38 
02 28 49 62 36 
02 27 86 70 95 
32 43 30 93 41 


99 19 72 58 35 
48 21 49 72 97 
52 37 68 15 53 
97 50 52 53 52 
36 05 09 18 11 


49 09 26 00 74 
79 19 64 81 82 
22 98 30 16 31 
26 78 21 68 69 
71 01 63 17 60 


26 42 94 52 02 
78 92 51 96 51 
83 24 87 69 29 
57 79 42 40 89 
11 65 19 43 07 


83 31 85 65 66 
28 79 13 20 82 
24 85 44 25 50 
55 81 75 24 52 
44 86 19 58 92 


31 97 67 52 15 
34 81 39 46 86 
75 62 83 95 41 
51 32 79 97 05 
23 71 32 96 19 


20 79 70 09 30 
13 07 89 72 08 
94 26 82 37 43 
13 55 88 38 43 
02 44 24 97 71 

34 90 96 63 54 
13 67 06 34 98 
18 75 55 82 66 
91 25 52 57 15 
76 24 00 14 92 


81 14 53 80 93 
00 37 75 14 94 
34 23 00 14 50 
75 37 43 83 85 
97 93 12 70 89 

22 84 36 38 99 
04 20 80 12 54 
34 77 27 71 79 
21 54 40 05 50 
14 29 12 17 73 


71 94 10 18 14 83 

83 85 06 72 66 07 

96 85 41 17 71 69 

53 74 54 62 99 68 

42 52 33 24 91 05 

85 36 25 03 27 49 

01 18 54 20 76 92 

67 65 85 92 68 16 

67 51 66 45 69 84 

77 46 44 24 30 48 


69 76 53 25 
47 30 17 11 
20 15 98 82 
93 74 43 95 
87 53 15 77 

24 72 10 50 
10 47 04 65 
43 83 18 74 
72 74 32 30 
50 36 30 24 


SSiiouTS? Tables and Formulas 

V Sons, Inc., New York. Reprinted by permission of 


27 36 65 65 05 

16 02 63 97 30 
79 69 68 50 31 
06 26 79 78 87 
49 92 83 97 80 

95 14 18 26 64 
54 45 82 42 90 
12 48 68 87 22 

17 70 40 90 24 
93 08 01 39 37 


by A. Held. Copyright © 1952 by 
John Wiley & Sons, Inc. 



Table A. 2 Binomial Probabilities 
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Table A.3 Cumulative Normal Distribution 
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Table A.4 Poisson Probabilities 


A table of for= 0.1 (0.1 )2(0.2)4(1 )10 ] 


\ ' 

\ ° 

/, \ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

1 .9048 

.0905 

.0045 

.0002 

.0000 








.2 .8187 

.1637 

.0164 

.0011 

.0001 

,0000 







.3 .7408 

2222 

.0333 

.0033 

.0002 

.0000 







4 6703 

.2681 

.0536 

0072 

.0007 

.0001 

0000 






.5 6065 

.3033 

.0758 

.0126 

.0016 

.0002 

.0000 






.6 .5488 

.3293 

.0988 

.0198 

.0030 

0004 

.0000 






7 .4966 

.3476 

.1217 

.0284 

.0050 

.0007 

.0001 

.0000 





.8 .4493 

.3595 

.1438 

.0383 

.0077 

.0012 

.0002 

.0000 





.9 .4066 

.3659 

.1647 

.0494 

.0111 

,0020 

.0003 

.0000 




f 

1.0 3679 

.3679 

.1839 

.0613 

.0153 

.0031 

.0005 

.0001 

.0000 




1.1 .3329 

.3662 

.2014 

0738 

.0203 

.0045 

.0008 

.0001 

.0000 




1.2 .3012 

.3614 

.2169 

.0867 

.0260 

,0062 

.0012 

.0002 

.0000 




1.3 .2725 

.3543 

.2303 

.0998 

.0324 

.0084 

,0018 

.0003 

.0001 

.0000 



1.4 .2466 

.3452 

.2417 

.1128 

.0395 

.0111 

.0026 

.0005 

.0001 

.0000 



1.5 .2231 

.3347 

.2510 

.1255 

.0471 

.0141 

.0035 

.0008 

.0001 

.0000 



1.6 .2019 

.3230 

.2584 

.1378 

.0551 

.0176 

.0047 

.0011 

.0002 

.0000 



1.7 .1827 

.3106 

2640 

.1496 

.0636 

.0216 

.0061 

.0015 

.0003 

.0001 

.0000 


1.8 .1653 

.2975 

.2678 

.1607 

.0723 

.0260 

.0078 

.0020 

.0005 

.0001 

.0000 


1.9 .1496 

.2842 

.2700 

.1710 

.0812 

.0309 

.0098 

.0027 

.0006 

.0001 

.0000 


2.0 .1353 

.2707 

.2707 

.1804 

.0902 

.0361 

.0120 

.0034 

,0009 

.0002 

.0000 


2.2 .1108 

.2438 

.2681 

.1966 

.1082 

.0476 

.0174 

.0055 

.0015 

.0004 

.0001 

.0000 

2.4 .0907 

.2177 

.2613 

.2090 

.1254 

.0602 

,0241 

.0083 

.0025 

.0007 

.0002 

.0000 

2.6 .0743 

.1931 

.2510 

.2176 

.1414 

.0735 

.0319 

.0118 

.0038 

.0011 

.0003 

.0001 

28 .0608 

.1703 

.2384 

.2225 

.1557 

.0872 

,0407 

.0163 

,0057 

.0018 

,0005 

.0001 

3.0 .0498 

.1494 

.2240 

.2240 

.1680 

.1008 

.0504 

.0216 

.0081 

.0027 

.0008 

.0002 

3.2 .0408 

.1304 

.2087 

.2226 

.1781 

.1140 

.0608 

.0278 

.0111 

.0040 

.0013 

.0004 

3.4 .0334 

.1135 

.1929 

.2186 

.1858 

.1264 

.0716 

.0348 

.0148 

.0056 

.0019 

,0006 

3.6 .0273 

.0984 

.1771 

.2125 

.1912 

.1377 

.0826 

.0425 

.0191 

.0076 

.0028 

.0009 

3 8 .0224 

.0850 

.1615 

.2046 

.1944 

.1477 

.0936 

.0508 

.0241 

.0102 

.0039 

.0013 

4.0 .0183 

.0733 

.1465 

.1954 

.1954 

.1563 

.1042 

.0595 

.0298 

.0132 

.0053 

.0019 

5 0 0067 

.0337 

.0842 

.1404 

.1755 

.1755 

.1462 

.1044 

.0653 

.0363 

.0181 

.0082 

6 0 .0025 

.0149 

.0446 

.0892 

.1339 

.1606 

.1606 

.1377 

.1033 

.0688 

.0413 

.0225 

7 0 .0009 

.0064 

.0223 

.0521 

.0912 

.1277 

.1490 

.1490 

.1304 

.1014 

.0710 

.0452 

8 0 .0003 

.0027 

.0107 

.0286 

.0573 

.0916 

.1221 

.1396 

.1396 

.1241 

.0993 

.0722 

9 0 0001 

.0011 

.0050 

.0150 

.0337 

.0607 

.0911 

.1171 

.1318 

.1318 

.1186 

.0970 

10.0 .0000 

.0005 

.0023 

.0076 

.0189 

.0378 

.0631 

.0901 

.1126 

.1251 

.1251 

.1137 i 


Table A.4 (Continued) 
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Table A.5 "Student's" t Distribution 


Values of Iq such that 




\P 

\ 0.750 0.900 0.950 0.975 0.990 0.995 0.999 0.9995 

V \ 


1 

1.000 

3.078 

2 

0.816 

1,886 

3 

0.765 

1.638 

4 

0.741 

1.533 

5 

0.727 

1.476 

6 

0.718 

1.440 

7 

0.711 

1.415 

8 

0.706 

1.397 

9 

0.703 

1.383 

10 

0.700 

1.372 

11 

0.697 

1.363 

12 

0.695 

1.356 

13 

0.694 

1.350 

14 

0.692 

1.345 

15 

0.691 

1.341 

16 

0.690 

1.337 

17 

0.689 

1.333 

18 

0.688 

1.330 

19 

0.688 

1.328 

20 

0.687 

1.325 

21 

0.686 

1.323 

22 

0.686 

1.321 

23 

0.685 

1.319 

24 

0.685 

1.318 

25 

0.684 

1.316 

26 

0,684 

1.315 

27 

0.684 

1.314 

28 

0.683 

1.313 

29 

0,683 

1.311 

30 

0.683 

1.310 

40 

0.681 

1.303 

60 

0.679 

1.296 

120 

0.677 

1.289 

/. 

0.674 

1 282 


6.314 

12.706 

31.821 

2.920 

4.303 

6.965 

2.353 

3.182 

4.541 

2.132 

2.776 

3.747 

2.015 

2.571 

3.365 

1.943 

2.447 

3.143 

1.895 

2.365 

2.998 

1.860 

2.306 

2.896 

1.833 

2.262 

2,821 

1.812 

2.228 

2.764 

1.796 

2.201 

2.718 

1.782 

2.179 

2.681 

1.771 

2.160 

2.650 

1.761 

2,145 

2.624 

1,753 

2.131 

2.602 

1.746 

2.120 

2.583 

1.740 

2.110 

2.567 

1.734 

2.101 

2.552 

1.729 

2.093 

2.539 

1.725 

2.086 

2.528 

1.721 

2,080 

2.518 

1.717 

2.074 

2.508 

1.714 

2.069 

2.500 

1.711 

2.064 

2.492 

1.708 

2.060 

2.485 

1.706 

2.056 

2.479 

1.703 

2.052 

2.473 

1.701 

2.048 

2.467 

1.699 

2.045 

2.462 

1.697 

2.042 

2.457 

1.684 

2.021 

2.423 

1.671 

2.000 

2.390 

1 658 

1.980 

2.358 

1 645 

1.960 

2,326 


63.657 

318.31 

636.62 

9.925 

22.326 

31.598 

5.841 

10.213 

12.924 

4.604 

7.173 

8.610 

4.032 

5.893 

6.869 

3.707 

5.208 

5.959 

3.499 

4.785 

5.408 

3.355 

4.501 

5 041 

3.250 

4.297 

4.781 

3.169 

4.144 

4.587 

3.106 

4.025 

4,437 

3.055 

3.930 

4.318 

3,012 

3.852 

4.221 

2.977 

3.787 

4.140 

2.947 

3.733 

4.073 

2.921 

3.686 

4.015 

2.898 

3.646 

3.965 

2.878 

3.610 

3.922 

2.861 

3.579 

3.883 

2.845 

3.552 

3.850 

2831 

3.527 

3.819 

2.819 

3.505 

3792 

2.807 

3.485 

3.767 

2.797 

3.467 

3.745 

2.787 

3.450 

3.725 

2.779 

3.435 

3.707 

2.771 

3.421 

3.690 

2.763 

3.408 

3.674 

2.756 

3.396 

3.659 

2,750 

3.385 

3.646 

2.704 

3.307 

3.551 

2.660 

3.232 

3.460 

2.617 

3.160 

3.373 

2.576 

3.090 

3.291 


ource. Reprinted from Table F. Statistics and Experimental Design m Engineering 
ciences by N. L. Johnson and F. C. Leone. Copyright © 196^ by John ** 

-arson and Hartley, Biometrika Tables tor Statisticians. Vol. 1 (]^998), p. 138, '''■ 

sher and Yates, Statistical Tables lor Biological, ^drjcujtural and Medical Resear^^^ by 
jngman Group Ltd., London. (Previously published by Oliver & Boyd, Edinburgh) and by 

jrmission of authors and publishers. 
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Source. Reproduced by permission of the publishers, Longman Group Limited, London, from Fisher & Yates, Statistical Tables 
for Biological. Agricultural, and Medical Research. 6th ed., 1974. 
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Table A.8 The Chi-Square Distribution 


Values of xl p such that 


P = 


1 


2'''2r(v/2) 




0 


,v/2-l „-yl2 


e dy 



P 

V 

0.005 

0.010 

0.025 

0.050 

0.100 

0.250 

0.500 

1 

0.00004 

0.00016 

0.00098 

0.00393 

0.01579 

0.1015 

0.4549 

2 

0.0100 

0.0201 

0.0506 

0.1026 

0.2107 

0.5754 

1.386 

3 

0.0717 

0.1148 

0.2158 

0.3518 

0.5844 

1.213 

2.366 

4 

0.2070 

0.2971 

0.4844 

0.7107 

1.064 

1.923 

3.357 

5 

0.4117 

0.5543 

0.8312 

1.145 

1.610 

2.675 

4.351 

6 

0.6757 

0.8721 

1.2373 

1.635 

2.204 

3.455 

5.348 

7 

0.9893 

1.239 

1.690 

2.167 

2.833 

4.255 

6.346 

8 

1.344 

1.646 

2.180 

2.733 

3.490 

5.071 

7.344 

9 

1.735 

2.088 

2.700 

3.325 

4.168 

5.899 

8.343 

10 

2.156 

2.558 

3.247 

3.940 

4.865 

6.737 

9.342 

11 

2.603 

3.053 

3.816 

4.575 

5.578 

7.584 

10.34 

12 

3.074 

3.571 

4.404 

5.226 

6.304 

8.438 

11.34 

13 

3.565 

4.107 

5.009 

5.892 

7.041 

9.299 

12.34 

14 

4.075 

4.660 

5.629 

6.571 

7.790 

10.17 

13.34 

15 

4.601 

5.229 

6.262 

7.261 

8.547 

11.04 

14.34 

16 

5.142 

5.812 

6.908 

7.962 

9.312 

11.91 

15.34 

17 

5.697 

6.408 

7.564 

8.672 

10.09 

12.79 

16.34 

18 

6.265 

7.015 

8.231 

9.390 

10.86 

13.68 

17.34 

19 

6-.844 

7.633 

8.907 

10.12 

11.65 

14.56 

18.34 

20 

7.434 

8.260 

9.591 

10.85 

12.44 

15.45 

19.34 

21 

8.034 

8.897 

10.28 

11.59 

13.24 

16.34 

20.34 

22 

8.643 

9.542 

10.98 

12.34 

14.04 

17.24 

21.34 

23 

9.260 

10.20 

11.69 

13.09 

14.85 

18.14 

22.34 

24 

9.886 

10.86 

12.40 

13.85 

15.66 

19.04 

23.34 

25 

10.52 

11.52 

13.12 

14.61 

16.47 

19.94 

24.34 

26 

11.16 

12.20 

13.84 

15.38 

17.29 

20.84 

25.34 

27 

11.81 

12.88 

14.57 

16.15 

18.11 

21.75 

26.34 

28 

12.46 

13.56 

15.31 

16.93 

18.94 

22.66 

27.34 

29 

13.12 

14.26 

16.05 

17.71 

19.77 

23.57 

28.34 

30 

13.79 

14.95 

16.79 

18.49 

20.60 

24.48 

29.34 

40 

20.71 

22.16 

24.43 

26.51 

29.05 

33.66 

39.34 

50 

27.99 

29.71 

32.36 

34.76 

37.69 

42.94 

49.33 

60 

35.53 

37.48 

40.48 

43.19 

46.46 

52.29 

59.33 

70 

43.28 

45.44 

48.76 

51.74 

55.33 

61.70 

69.33 

80 

51.17 

53.54 

57.15 

60.39 

64.28 

71.14 

79.33 

90 

59.20 

61.75 

65.65 

69.13 

73.29 

80.62 

89,33 

100 

67.33 

70.06 

74.22 

77.93 

82.36 

90.13 

99.33 


Source. Reprinted from Table E, Statistics ana txperimenwi ue^iyn 
Sciences by N. L. Johnson and F. C. Leone. Copyright © 1964 by John 

York. Also reprinted from Biometrika Tables for Statisticians. Vol. 1.. ^ „ gone Inc and 

Hartley, pp. 130-131. Copyright © 1958. Reprinted by permission of John Wiley & Sons, Inc. and 

the Biometrika Trustees. 
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Table A.8 (Continued) 



0.750 

0.900 

0.950 

1 

1.323 

2.706 

3.841 

2 

2.773 

4.605 

5.991 

3 

4.108 

6.251 

7.815 

4 

5.385 

7.779 

9.488 

5 

6.626 

9.236 

11.07 

6 

7.841 

10.64 

12.59 

7 

9.037 

12.02 

14.07 

8 

10.22 

13.36 

15.51 

9 

11.39 

14.68 

16.92 

10 

12.55 

15.99 

18.31 

11 

13.70 

17.28 

19.68 

12 

14.85 

18.55 

21.03 

13 

15.98 

19.81 

22.36 

14 

17.12 

21.06 

23.68 

15 

18.25 

22.31 

25.00 

16 

19.37 

23.54 

26.30 

17 

20.49 

24.77 

27.59 

18 

21.60 

25.99 

28.87 

19 

22.72 

27.20 

30.14 

20 

23.83 

28.41 

31.41 

21 

24.93 

29.62 

32.67 

22 

26.04 

30.81 

33.92 

23 

27.14 

32.01 

35.17 

24 

28.24 

33.20 

36.42 

25 

29.34 

34.38 

37.65 

26 

30.43 

35.56 

38.89 

27 

31.53 

36.74 

40.11 

28 

32.62 

37.92 

41.34 

29 

33.71 

39.09 

42.56 

30 

34.80 

40.26 

43.77 

40 

45.62 

51.80 

55.76 

50 

56.33 

63.17 

67.50 

60 

66.98 

74.40 

79.08 

70 

77.58 

85.53 

90.53 

80 

88.13 

96.58 

101.9 

90 

98.65 

107.6 

113.1 

100 

109.1 

118.5 

124.3 


0.975 

0.990 

0.995 

0.999 

5.024 

6.635 

7.879 

10.83 

7.378 

9.210 

10.60 

13.82 

9.348 

11.34 

12.84 

16.27 

11.14 

13.28 

14.86 

18.47 

12.83 

15.09 

16.75 

20.52 

14.45 

16.81 

18.55 

22.46 

16.01 

18.48 

20.28 

24.32 

17.53 

20.09 

21.96 

26.12 

19.02 

21.67 

23.59 

27.88 

20.48 

23.21 

25.19 

29.59 

21.92 

24.72 

26.76 

31.26 

23.34 

26.22 

28.30 

32.91 

24.74 

27.69 

29.82 

34.53 

26.12 

29.14 

31.32 

36.12 

27.49 

30.58 

32.80 

37.70 

28.85 

32.00 

34.27 

39.25 

30.19 

33.41 

35.72 

40.79 

31.53 

34.81 

37.16 

42.31 

32.85 

36.19 

38.58 

43.82 

34.17 

37.57 

40.00 

45.32 

35.48 

38.93 

41.40 

46.80 

36.78 

40.29 

42.80 

48.27 

38.08 

41.64 

44.18 

49.73 

39.36 

42.98 

45.56 

51.18 

40.65 

44.31 

46.93 

52.62 

41.92 

45.64 

48.29 

54.05 

43.19 

46.96 

49.64 

55.48 

44.46 

48.28 

50.99 

56.89 

45.72 

49.59 

52.34 

58.30 

46.98 

50.89 

53.67 

59.70 

59.34 

63.69 

66.77 

73.40 

71.42 

76.15 

79.49 

86.er6 

83.30 

88.38 

91.95 

99.61 

95.02 

100.4 

104.2 

112.3 

106.6 

112.3 

116.3 

124.8 

118.1 

124.1 

128.3 

137.2 

129.6 

135.8 

140.2 

149.4 



INDEX 


Abbe, Cleveland, 218, 222 
Absolute experiment, 269, 272 
Acceptance sampling, 87, 88 
Achenwall, Gottfried, 6 
Adams, William J., 108, 113, 145, 

147 

Addition rule, 72, 76 
Adrain, Robert, 194 
Airy, G. B., 133 

Alternative hypothesis, 169, 172 
Ames, 246, 247 

Analysis of variance, 259-265, 287-291 
Arberry, J., 97, 102 
Arbuthnot, J., 167, 172 
Aristotle, 4 

Ars Conjectandi, 15,98, 107 
Astragalus, 11 
Average, 51 
Axioms, 71,72, 78 

Bacon, Francis, 19, 20, 23 
Bayes, Thomas, 298, 299, 302, 303 
Bayesian controversy, 63, 299-303 
Bayesian intervals, 162, 163, 181, 

184 

Bayesian statistics, 161,300 
Bayes’ rule, 76, 77, 313-315 
Bayes’ theorem, 76, 77, 299, 302, 

303 

Bennett, J. H., 248 

Berkson, Joseph, 302, 303 

Bernoulli, James, 15, 84, 96, 98, 107 

Between class SS, 262, 263, 265, 289 

Bills of mortality, 5 

Binomial coefficients, 98, 102 

Binomial distribution, 97-102, 118,119 

Binomial expansion, 97, 102 

Biometrics, 306 

Biometrics laboratory, 306 

Biometrika, 23 

Birthday problem, 86, 92 

Bishop, Philip W., 19, 24 

Bivariate frequency distribution, 189,194 


Bivariate normal distribution, 189-195, 
213-215 

Boalch,D. H., 248 
Bolch, B. W., 342 
Boorstin, Daniel, 6, 7 
Bortkiewicz, L., 37,42, 46, 120, 123, 

320 

Box, Joan Fisher, 308 

Bravais, A., 194 

Broadbalk field, 245 

Bromberger, Sylvain, 19, 24 

Buckland,W. R.,98, 102 

Bulmer, M. G., 78 

Bureau of the Census, 127, 133 

C^ori, Florian, 59 
Cardano, Gerolamo, 13, 16 
Carnap, Rudolph, 29, 32 
Cassedy, James H., 5, 7 
Census, 3,4, 6, 127 
Center of gravity, 56 
Central limit theorem, 139-147 
Central tendency, 51,55-58 
Chance mechanisms, 12 
Chi-square, 153,318-324 
Class boundaries, 40, 46 
Classification, 337 
Class mark, 41,46 
Clay tobacco pipe, 39, 45, 236 
Cochran, W. G., 269, 272, 292 
Cohen, Daniel, 109, 113 
Collective, 66, 67 
Combinations, 84, 85, 92 
Commonalities, 330-332 
Comparative experiment, 269, 272 
Completely randomized design, 286, 287, 
290-292 

Concentric ellipses, 192, 194, 199 
Conditional distributions, 213 
Conditional probability, 74, 75, 78 
Confidence intervals, 162, 180, 181, 

184 

for^o,231,232 


S6S 



364 INDEX 


fori3i,231,232 

for entire regression line, 240, 241 

forM, 180, 181 

for mean Y value in regression, 239- 
241 

Consonance intervals, 182, 279 
Continuous measurements, 40, 46 
Contour map, 189, 190, 194 
Convergence to normality, 140, 145 
Correction factor, 289, 291 
Correlation, 22, 199, 205 
Correlation coefficient, 201-205, 307, 
308,319 

Cox,D. R., 269,272 
Cox, Gertrude, 247, 269, 272, 292 
Cramer, Harold, 78 
Credibility index, 290 

Darwin, Charles, 21,22, 23, 211 
Data analysis, 308 

David,F.N.,5,7,11,12,14,16,117, 
123 

Davis, James A., 134 
Decision-making, 308 
Decision theory, 312, 315 
Deductive inference, 160, 163 
Degrees of freedom, 153, 261-263, 265, 
319,320,323 
De Mere, Chevalier, 14 
De Moivre, Abraham, 15, 20, 107, 108, 
1 13, 117, 121, 122, 138, 140, 147 
Description of states, 4 
Design of experiments, 28 
Deviations, 227 
Dice, 11, 12 

Discrete measurements, 40,46 
Discriminant functions, 337-342 
Dispersion, 55 
Distribution of errors, 20 
Dodge-Roming tables, 87 
Domesday book, 5 
Doyle, Arthur Conan, 159 
Draper, N. R., 233, 241 
Dudley, Lavinia P., 7 
Duncan, Acheson J., 87, 92 

Edgeworth, Ysidro, 259, 265 
Einstein, Albert, 19 
Eisenhart, Churchill, 153, 155 
Encke, Johann Franz, 133, 134 


Enhanced picture, 41,42, 44 
Equally likely events, 83, 91 
Error SS, 289-291 
Estimates, 30, 32, 52, 162, 163 
of|3o, 226,227,232 
ofi3p 226, 227,232 
ofM, 56, 130 
ofa2,226, 229, 232 
Estimation, 312, 315 
Event, 71-73, 78 
Eves, Howard, 98, 102 
Expectation, 53 
Expected value, 54, 58 
Experimental design, 246, 268, 269- 
273 

Experimental error, 272, 272 
Experimental material, 270, 272 
Experimental units, 270, 272, 288, 

292 

Extrasensory perception, 86, 87, 92 

F test, 287, 290-292 
Factor, 329-332 

Factor-loading coefficients, 330-332 
Factor structure, 329 
Farrington, Benjamin, 7 
Featherstone, Ernest, 309 
Feller, William, 86, 92 
Fermat, Pierre de, 14, 53, 62 
Fiducial, 177, 180 
Fiducial interval, 180, 184 
Fiducial probability, 180, 181 
Fisher, R. A., 26, 28, 30, 32, 46, 56, 
133, 134, 155, 167, 168, 172, 180, 
181, 184, 240, 241,245,246, 248, 
251,254, 259, 264,265,269,273, 
299,300,302,303,306-309,31 1. 
315,320,323,324,341,342 
Fisz, Marek, 120, 123 
Fitzgerald, Edward J., 97, 102 
Folks, Leroy, 182, 184, 233, 282 
Forrest, D. W., 211,212, 215 
Frequency distributions, 37,40, 45 

Galton, Francis, 22, 23, 108, 109, 1 13, 
194, 198, 201, 205, 210-213, 215, 
306,329,333 
Galton laboratory, 306 
Gambling, 13 
Games of chance, 11, 250 



365 INDEX 


Gauss, Karl Friedrich, 15, 20, 23, 107, 
108, 1 13, 194,218,222,224 
Gaussian curve, 15,45 
Gaussian distribution, 110, 113 
Goodness of an estimate, 177 
Goodness-of-fit, 162, 319-321,323 
Gosset, W.S., 150-155, 158 
Gresham College, 45, 106, 299, 305, 
306 

Graunt, John, 5 
Gridgeman, N. T., 100, 102 
Grunbaum, Adolf, 19, 24 
Guinness Co., 151, 155 
Gur, R. C.,281,282 

Hacking, 1., 66, 68, 299,303 

Hald,A.,30,32 

Haldane,]. B. S.,305,309 

Harman, Harry H., 333 

Hartley, H. O., 110 

Histogram, 41 

Hogben, Lancelot, 98, 102 

Hooker, Joseph, 211 

Hounds and jackals, 11,70 

Huang,C. J.,342 

Hume, Ivor Noel, 39,40,46, 233 

Independence, 76,78 
Independent trials, 98, 99 
Index of credibility, 290 
Inductive inference, 160, 163 
Inference, 159-163 
Interval estimation, 180, 182, 184 
Intervals, 30, 32, 162, 163, 180-184, 
312,315 

IQ,109,110,113 

Jeffreys, H., 66, 68 
Johnson, N. L., 113, 123 
Joint probability, 73, 74, 78 

Kaplan, Jerold Z., 281,282 
Kempthome,0., 182, 184, 233, 248, 
269,273,282 

Kendall, M. G., 12, 16, 98, 102, 190 
192,193,195,307,309 
Kerrick,J. E., 64, 68 
Keynes,]. M., 63, 66, 68 
King, Amy C., 78 
Kjetsaa, Geir, 38,39,46 


Kollectiv, 28 
Kotz,S., 113, 123 
Kramp, C., 108 

Laplace, 15, 20, 23, 108, 218, 222 
Laplace’s principle of insufficient reason, 
302 

Larson, H. J., 102 
Latter, Oswald, 37, 38, 46 
Lave, Lester B., 232, 233 
Lawes of John Bennet, 27, 245, 247 
Law of errors, 107, 306 
Law of large numbers, 107 
Least squares estimate, 219,221, 

222 

Lee, AUce, 189, 191, 194 
Legendre, A. M., 216, 218, 222 
Leibniz, G. W., 85 
Life expectancy, 54, 58 
Lindley, D. V., 302, 303 
Linear trend, 260 
Logic, 159-163 
Loss function, 313 
Lyman, Howard B., 109, 113 

McHenry, Henry M., 341,342 
Mackenzie, W. A., 265 
McMullen, Launce, 155 
Maddi, Dorothy Lender, 37, 46 
Mahalanobis, P. C., 264, 336, 341, 

342 

Marginal probabUity, 74, 78 
Mather, D., 307, 309 
Mean, 51, 57, 58 
Mean square, 261, 263-265 
Measure of association, 201 
Median, 51, 58 
Meier, Paul, 275, 282 
Mendel, Gregor, 18, 21, 22, 23 
Mendenhall, William, 134 
Menges,G., 312, 315 
Method of least squares, 217-222 225 
232 

Metropolitan Life Assurance Co., 75, 

79 

Minimax rule, 312, 313, 315 
Minimum variance unbiased estimate, 

179,183 
Mode, 51,58 

Model, 161, 163, 279, 282, 289 



366 INDEX 


Moment of inertia, 57 
Moments, 56, 59 
Moral statistics, 21 
Morrison, D. F., 342 
Mortality rate, 50, 75 
Mosteller, F., 46, 79 
Mullet, Gary M., 121, 123 
Multiple regression, 247 
Munford, A. G., 86, 92 
Mutually exclusive events, 76, 78 

Nagel, Ernest, 19, 24 
Neyman, J., 152, 155, 166, 167, 169, 
172, 180, 181, 184,302,308 
Neyman-Pearson theory, 312 
Nigel, Dunstone, 232, 233 
Noether, G. E., 63, 68 
Nonresponse rate, 132 
Normal curve, 15 

Normal distribution, 45,46, 107-113, 
139, 140, 142, 145,306 
Nourse, Alan E., 232, 233 
Novum organum, 19 
Null hypothesis, 169-172 

Observation and theory, 19 
Observed significance level, 168, 169, 
172,231 

O’Callaghan, E. B., 4 
OC curve, 88 
Olby, Robert C., 22, 24 
One-way classification, 264 
Ott, Lyman, 134 
Outcome, 71 

Paired design, 275, 276, 279-282, 

292 

Parameters, 28, 30,32,130, 302, 

313 

Parameter space, 313 
Parzen, E., 79 

Pascal, Blaise, 10, 14, 16, 53, 82 
Pascal’s trian^e, 12,15, 98 
Pearl, R., 321 

Pearson, E. S., 1 10, 151,155,167, 169, 
172,176, 302,305,307-309 
Pearson, Karl, 15, 16, 20, 27, 28, 36, 

37,45,46,55,56,59,108, 113, 
133,134,151,167,172.173,189, 
191 194,203,205,299,300,302, 


305-309,319,323,324,329,332, 

333 

Pearson product-moment correlation, , 

203-205 r 

Peirce, C. S., 29, 32 
Percentile, 58, 59 

Permutations, 84, 92 , . , 

Pfeiffer, John, 275, 282 ji 

Plackett.R. L.,218,222 
Plana, G., 194 ■ 

Plausibility, 161, 163 
Point estimation, 177, 182, 183 
Poisson.S. D.,42,46, 116, 121-123 
Poisson distribution, 42, 45, 46, 117-123, 
320 

Political arithmetic, 5, 7 
Polya, G., 139, 147,217 
Pooled df, 278 
Pooled SS, 278 
Population, 27-30, 161 
mean, 52, 53, 58, 219, 222 ^ 

regression line, 211-215 
Posterior probabilities, 302, 303 
Power, 170 

Prediction of Y values, 237-239, 

241 

Premise, 160, 161, 163 
Price, Richard, 299, 302 
Principal components, 332 
Prior probabilities, 302, 303 
Probability: 

classical definition, 84, 92 
limiting frequency, 66, 67, 302 '.ti 

logical relationship, 67 -8 

measure of belief, 63,64, 67,161, '.P 

299,303 

physical property, 63, 64, 67,161, 

299 ^ 

subjective, 67 
Problem of points, 13,14 
Product rule, 76 - 

Publicistics, 6 •’ 

Pytkowski, W., 180,184 

Quality control, 87, 92 
Quetelet, Adolphe, 21, 23, 108 
Quipus, 2, 5 

Radius of gyration, 57 
Ramsey, F. P., 67,68 


CO c-l 



367 INDEX 


Rand Corporation, 32 
Random event, 11 
Randomization, 250, 270 
.andomized block design, 287, 288, 
290-292 

sindom numbers, 30, 31 
Random sample, 29, 32, 161, 163 
1, Cecil B., 78 

egression, 22, 211-215, 247, 260-262, 
^65 

Releative frequency, 40, 46, 65, 66, 

181 

Reliability, 89, 90, 92 
Replications, 271,272 
Residual SS, 261, 262, 265 
Reversion, 212, 215 
Risk, 314, 315 
Roberts, H. V., 51,59 
Rothamsted, 27, 244-248, 269, 308 
Rourke,R. E. K., 79 
Russell, E. John, 245, 246, 248 

S^ckeim,H. A., 281,282 
Salk vaccine, 274, 275 
Sample, 27, 28,32, 161 
small, 152, 155 
Sample average, 128 
Sample mean, 178, 179, 219, 222 
Sample median, 178 
Sample space, 71,72, 78 
Sample variance, 56 
.npling, 27, 127, 133 
npling distribution, 128, 133 
aucy, M. C., 281, 282 
ravage,!. R.,63,68 
Scheaffer, Richard L., 134 
Schols, C. M., 194 
< ^chwartz, George, 19, 24 
cientific method, 19 
:ollar, Irwin, 41,46 
ieskin, Eugene P., 232, 233 
Ihewhart, Walter, 87 
Significance level, 168, 169, 279 281 
290,291,320 

Simple random sample, 128. 130 
133 

Sinclair, John, 6 
^ky ms, Brian, 160, 163 
maJ samples, 152, 155 
mi h, Charles Foster, 51,59 


Smith, D. E., 12, 16,98 
Smith, H., 233, 241 

Snedecor, George W., 240, 241,246-248, 
258,259, 265,269,273,292 
Snedecor’s F, 240, 319 
Sootin, Harry, 21,24 
Sources of variation, 259-265 
Spearman, C., 328, 329, 333 
Specificities, 330-332 
Staatenkunde, 4 

Standard deviation, 55, 57, 58, 306 
Standard normal distribution, 111-113, 
152 

Standardized variable, 329 
Statistical inference, 29, 161,312, 

315 

Statistical laboratory, 247, 248 
Statistical methods, 161, 247, 248 
Statistical test, 167, 173 
Statistics, 4 

Stephan, Frederick J., 27, 32 
Stigler, Stephen, 259, 265 
Stouffer, Samuel A., 305, 309 
Straight line, 199, 200 
Stratified random sample, 130, 131, 

133 

“Student,” 121, 123, 133, 151, 155, 
167,172,292 

“Student’s”!, 151-155, 168,319 
Success and failure, 98, 102 
Sum of squares, 259-265 
Survey, 4 

Tests, 30, 32, 162, 163, 31 1, 315 
for/Jo. 230, 232 
for/Ji, 230, 232 
of hypotheses. 162 
foriu, 168, 169, 171 
of significance, 162, 168, 169 
Test statistic, 168, 171, 172 
Thakur, Shivesh C., 86, 92 
Thomas, G. B., 79 
Thouless, Robert H., 86, 92 
Tippett, L. H. C., 30, 33 
Todhunter, I., 13, 14, 16 
Total SS, 261, 263, 265 
Treatment, 251-255, 270-272, 288- 
292 

Treatment kfect, 253, 255 
Treatment MS and SS, 289-292 



368 INDEX 


Tree diagrams, 90-92, 100 
Trial of the Pyx, 167, 172 
Two-group design, 275-279, 281, 282, 
292 

Two-way tables, 321,324 
Type 1 and type II error, 170, 172 

Unbiased estimate, 130, 131, 133, 177, 
183 

Univariate normal, 192, 194 
University College, 23, 108, 151,304, 
305,308 

Validity, 160, 163 
Variance, 56, 58, 59 
of3o’229 
of/3i, 229 
of X, 128-130 

Variation, 55, 58, 259, 260, 264, 265 


Venn,John, 29 

von Mises, R., 63, 67, 68 

Wakeman, R. John, 281, 282 
Wald,A.,310,311,315 
Walker, Helen M., 12, 13, 15, 16, 24,51, 
56,58,59, 107, 113, 194, 305,309 
Wallace, D. L., 46, 247 
Wallace, Henry, 247 
Wallis, W. A., 51,59 
Weldon, W. F. R , 37,46 
Westergaard, Harald, 4, 7 
WUks, S. S., 305, 309 
Wilson, John Rowan, 113 
Within class SS, 262-265, 289 

Yates, Frank, 30, 32, 307, 309 
Yule, G. Udny, 6, 7, 28, 33, 190, 192, 
193,195 


CASHMfR UNIVERSE?' 


Iqbal Library 
^cc No — 

Doiiaci . 








A % 



(ILLWfl IQBIIL LIBRORY 




163315 




4 




. ▼ 
I 
{ 


t 

; I 


i 

: i 
I I 


i 

I 

t 

i 


,k 


t 
i 
: I 



: # 

t 


t 
: i 
f I 
t 


i 
; I 
^ t 
i 
I 


f 

; f 

f 


I 





.3 

i 



I 


•i 



s’ 


i 




i:. 






















































































































































































































































